A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# Installing the libraries with the specified version.
!pip install pandas==1.5.3 numpy==1.25.2 matplotlib==3.7.1 seaborn==0.13.1 scikit-learn==1.2.2 statsmodels==0.14.1 -q --user
Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.
# this will help in making the Python code more structured automatically (help adhere to good coding practices)
#%load_ext nb_black
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# setting the precision of floating numbers to 5 decimal points
pd.set_option("display.float_format", lambda x: "%.5f" % x)
# Library to split data
from sklearn.model_selection import train_test_split
# To build model for prediction
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# To tune different models
from sklearn.model_selection import GridSearchCV
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
make_scorer, roc_auc_score,
ConfusionMatrixDisplay,
precision_recall_curve,
roc_curve,
)
df_main = pd.read_csv("INNHotelsGroup.csv")
# copying data to another variable to avoid any changes to original data
data = df_main.copy()
data.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00000 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68000 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00000 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00000 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50000 | 0 | Canceled |
Checking unique values of each columns with string data type.
#checking unique values for type meal plan
meal_plan = data['type_of_meal_plan'].unique()
#checking unique values for room_type_reserved
room_type_reserved = data['room_type_reserved'].unique()
#checking unique values for market_segment_type
market_segment_type = data['market_segment_type'].unique()
#checking unique values for booking_status
booking_status = data['booking_status'].unique()
print('type_of_meal_plan values are',meal_plan)
print('room_type_reserved unique values are',room_type_reserved)
print('market_segment_type unique values are',market_segment_type)
print('booking_status unique values are',booking_status)
type_of_meal_plan values are ['Meal Plan 1' 'Not Selected' 'Meal Plan 2' 'Meal Plan 3'] room_type_reserved unique values are ['Room_Type 1' 'Room_Type 4' 'Room_Type 2' 'Room_Type 6' 'Room_Type 5' 'Room_Type 7' 'Room_Type 3'] market_segment_type unique values are ['Offline' 'Online' 'Corporate' 'Aviation' 'Complementary'] booking_status unique values are ['Not_Canceled' 'Canceled']
data['booking_status'].unique()
array(['Not_Canceled', 'Canceled'], dtype=object)
data['booking_status'].value_counts()
Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64
data = data.drop('Booking_ID', axis=1) # Remove 'Booking_ID' column (axis=1 for columns)
data.shape
(36275, 18)
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_children 36275 non-null int64 2 no_of_weekend_nights 36275 non-null int64 3 no_of_week_nights 36275 non-null int64 4 type_of_meal_plan 36275 non-null object 5 required_car_parking_space 36275 non-null int64 6 room_type_reserved 36275 non-null object 7 lead_time 36275 non-null int64 8 arrival_year 36275 non-null int64 9 arrival_month 36275 non-null int64 10 arrival_date 36275 non-null int64 11 market_segment_type 36275 non-null object 12 repeated_guest 36275 non-null int64 13 no_of_previous_cancellations 36275 non-null int64 14 no_of_previous_bookings_not_canceled 36275 non-null int64 15 avg_price_per_room 36275 non-null float64 16 no_of_special_requests 36275 non-null int64 17 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(4) memory usage: 5.0+ MB
data.isnull().sum()
no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
data.duplicated().sum()
10275
data.describe(include='all').T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| no_of_adults | 36275.00000 | NaN | NaN | NaN | 1.84496 | 0.51871 | 0.00000 | 2.00000 | 2.00000 | 2.00000 | 4.00000 |
| no_of_children | 36275.00000 | NaN | NaN | NaN | 0.10528 | 0.40265 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 10.00000 |
| no_of_weekend_nights | 36275.00000 | NaN | NaN | NaN | 0.81072 | 0.87064 | 0.00000 | 0.00000 | 1.00000 | 2.00000 | 7.00000 |
| no_of_week_nights | 36275.00000 | NaN | NaN | NaN | 2.20430 | 1.41090 | 0.00000 | 1.00000 | 2.00000 | 3.00000 | 17.00000 |
| type_of_meal_plan | 36275 | 4 | Meal Plan 1 | 27835 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| required_car_parking_space | 36275.00000 | NaN | NaN | NaN | 0.03099 | 0.17328 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved | 36275 | 7 | Room_Type 1 | 28130 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| lead_time | 36275.00000 | NaN | NaN | NaN | 85.23256 | 85.93082 | 0.00000 | 17.00000 | 57.00000 | 126.00000 | 443.00000 |
| arrival_year | 36275.00000 | NaN | NaN | NaN | 2017.82043 | 0.38384 | 2017.00000 | 2018.00000 | 2018.00000 | 2018.00000 | 2018.00000 |
| arrival_month | 36275.00000 | NaN | NaN | NaN | 7.42365 | 3.06989 | 1.00000 | 5.00000 | 8.00000 | 10.00000 | 12.00000 |
| arrival_date | 36275.00000 | NaN | NaN | NaN | 15.59700 | 8.74045 | 1.00000 | 8.00000 | 16.00000 | 23.00000 | 31.00000 |
| market_segment_type | 36275 | 5 | Online | 23214 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| repeated_guest | 36275.00000 | NaN | NaN | NaN | 0.02564 | 0.15805 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| no_of_previous_cancellations | 36275.00000 | NaN | NaN | NaN | 0.02335 | 0.36833 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 13.00000 |
| no_of_previous_bookings_not_canceled | 36275.00000 | NaN | NaN | NaN | 0.15341 | 1.75417 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 58.00000 |
| avg_price_per_room | 36275.00000 | NaN | NaN | NaN | 103.42354 | 35.08942 | 0.00000 | 80.30000 | 99.45000 | 120.00000 | 540.00000 |
| no_of_special_requests | 36275.00000 | NaN | NaN | NaN | 0.61966 | 0.78624 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 5.00000 |
| booking_status | 36275 | 2 | Not_Canceled | 24390 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
data.nunique()
no_of_adults 5 no_of_children 6 no_of_weekend_nights 8 no_of_week_nights 18 type_of_meal_plan 4 required_car_parking_space 2 room_type_reserved 7 lead_time 352 arrival_year 2 arrival_month 12 arrival_date 31 market_segment_type 5 repeated_guest 2 no_of_previous_cancellations 9 no_of_previous_bookings_not_canceled 59 avg_price_per_room 3930 no_of_special_requests 6 booking_status 2 dtype: int64
data['no_of_children'].value_counts()
0 33577 1 1618 2 1058 3 19 9 2 10 1 Name: no_of_children, dtype: int64
Leading Questions:
data.columns
Index(['no_of_adults', 'no_of_children', 'no_of_weekend_nights',
'no_of_week_nights', 'type_of_meal_plan', 'required_car_parking_space',
'room_type_reserved', 'lead_time', 'arrival_year', 'arrival_month',
'arrival_date', 'market_segment_type', 'repeated_guest',
'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled',
'avg_price_per_room', 'no_of_special_requests', 'booking_status'],
dtype='object')
#Defining custom variable t plot combine histogram and box plot
def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (15,10))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a triangle will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
# function to create labeled barplots
def labeled_barplot(data, feature, perc=True, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n],
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
#Defining function to plot stacked barplot
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
# function to plot distributions wrt target
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
# Check for continuous data types
continuous_cols = data.select_dtypes(include=[np.number]) # Select numeric columns
# Loop through continuous columns
for col in continuous_cols.columns:
plt.figure() # Create a new figure for each plot
histogram_boxplot(data, col) # Pass the column name (feature)
# Extract the column title from the DataFrame
col_title = data.columns.get_loc(col) # Get the column index and use it to retrieve the title
plt.title(col_title) # Set the plot title using the column title
plt.show()
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
Insights
# Check for categorical data types
categorical_cols = data.select_dtypes(include=['object']) # Select object columns (categorical data)
# Loop through categorical columns
for col in categorical_cols.columns:
plt.figure() # Create a new figure for each plot
labeled_barplot(data, col) # Plot bar chart of categorical variables
plt.show();
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
iNSIGHTS
# creating a list of numerical columns
num_cols = data.select_dtypes(include=np.number).columns.tolist()
#plotting correlation heat map
plt.figure(figsize=(12, 7))
sns.heatmap(
data[num_cols].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
plt.figure()
distribution_plot_wrt_target(data,'avg_price_per_room' ,'market_segment_type')
plt.show();
<Figure size 640x480 with 0 Axes>
plt.figure()
distribution_plot_wrt_target(data,'no_of_previous_cancellations','repeated_guest')
plt.show();
<Figure size 640x480 with 0 Axes>
Repeated overwhemingly tend to have the highest number of previous cancellation which actually expected.
plt.figure()
distribution_plot_wrt_target(data,'no_of_special_requests','repeated_guest')
plt.show();
<Figure size 640x480 with 0 Axes>
plt.figure()
distribution_plot_wrt_target(data,'lead_time' ,'booking_status')
plt.show();
<Figure size 640x480 with 0 Axes>
plt.figure()
sns.countplot(data, x = 'market_segment_type', hue ='booking_status', )
plt.show();
plt.figure()
sns.countplot(data, x = 'arrival_month', hue ='booking_status', )
plt.show();
numerical_col = data.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numerical_col):
plt.subplot(5, 4, i + 1)
plt.boxplot(data[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
I will treat outliers here. However, the performance of the data will be tested both on data with outlier treated and with those outliers where not treated.
# functions to treat outliers by flooring and capping
def treat_outliers(df, col):
"""
Treats outliers in a variable
df: dataframe
col: dataframe column
"""
Q1 = df[col].quantile(0.25) # 25th quantile
Q3 = df[col].quantile(0.75) # 75th quantile
IQR = Q3 - Q1
Lower_Whisker = Q1 - 1.5 * IQR
Upper_Whisker = Q3 + 1.5 * IQR
# all the values smaller than Lower_Whisker will be assigned the value of Lower_Whisker
# all the values greater than Upper_Whisker will be assigned the value of Upper_Whisker
df[col] = np.clip(df[col], Lower_Whisker, Upper_Whisker)
return df
def treat_outliers_all(df, col_list):
"""
Treat outliers in a list of variables
df: dataframe
col_list: list of dataframe columns
"""
for c in col_list:
df = treat_outliers(df, c)
return df
numerical_col = data.select_dtypes(include=np.number).columns.tolist()
dt = treat_outliers_all(data, numerical_col)
# let's look at box plot to see if outliers have been treated or not
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numerical_col):
plt.subplot(5, 4, i + 1)
plt.boxplot(dt[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
data['booking_status'].value_counts()
Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64
dt['booking_status'].value_counts()
Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64
data1 = data.copy()
Encoding booking_status variables: Cancelled will be 1 while not cancelled will be 0.
# Converting Boolean variables in booking status column to numeric variables for data set without treated outliers
data1['booking_status'] = data1['booking_status'].replace({'Canceled': 1, 'Not_Canceled': 0})
data1['booking_status'].value_counts()
0 24390 1 11885 Name: booking_status, dtype: int64
# Converting Boolean variables in booking status column to numeric variables for data set without treated outliers
dt['booking_status'] = dt['booking_status'].replace({'Canceled': 1, 'Not_Canceled': 0})
dt['booking_status'].value_counts()
0 24390 1 11885 Name: booking_status, dtype: int64
Creating dummy variables for the following; type_of_meal_plan, room_type_reserved, and market_segment_type. Then dropping the columns.
import pandas as pd
def create_dummies(data, specific_cols):
"""
Creates dummy variables for specific columns in a pandas DataFrame.
Args:
data: The pandas DataFrame.
specific_cols (list): A list of column names to convert to dummy variables.
Returns:
pandas.DataFrame: The DataFrame with new dummy variables.
"""
# Create dummy variables for the specified columns
dummies = pd.get_dummies(data[specific_cols], drop_first=True)
# Drop the original columns from the DataFrame
data = data.drop(columns=specific_cols)
# Combine original data without the specified columns with dummy variables
return pd.concat([data, dummies], axis=1)
specific_cols = ['type_of_meal_plan', 'room_type_reserved', 'market_segment_type'] # Columns to convert to dummies
df = create_dummies(data1.copy(), specific_cols) # Saving data (outliers not treated) with dummy variables
# saving dt (outlier-treated) with dummy variables
dt1 = create_dummies(dt.copy(), specific_cols)
Checking data frame after dummy variable creation
df
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | 0 | 224.00000 | 2018 | 10 | 2 | 0 | 0 | 0 | 65.00000 | 0.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 0 | 2 | 3 | 0 | 5.00000 | 2018 | 11 | 6 | 0 | 0 | 0 | 106.68000 | 1.00000 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 2 | 0 | 2 | 1 | 0 | 1.00000 | 2018 | 2 | 28 | 0 | 0 | 0 | 60.00000 | 0.00000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | 0 | 211.00000 | 2018 | 5 | 20 | 0 | 0 | 0 | 100.00000 | 0.00000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 2 | 0 | 1 | 1 | 0 | 48.00000 | 2018 | 4 | 11 | 0 | 0 | 0 | 94.50000 | 0.00000 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36270 | 2 | 0 | 2 | 6 | 0 | 85.00000 | 2018 | 8 | 3 | 0 | 0 | 0 | 167.80000 | 1.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36271 | 2 | 0 | 1 | 3 | 0 | 228.00000 | 2018 | 10 | 17 | 0 | 0 | 0 | 90.95000 | 2.00000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36272 | 2 | 0 | 2 | 6 | 0 | 148.00000 | 2018 | 7 | 1 | 0 | 0 | 0 | 98.39000 | 2.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36273 | 2 | 0 | 0 | 3 | 0 | 63.00000 | 2018 | 4 | 21 | 0 | 0 | 0 | 94.50000 | 0.00000 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36274 | 2 | 0 | 1 | 2 | 0 | 207.00000 | 2018 | 12 | 30 | 0 | 0 | 0 | 161.67000 | 0.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
36275 rows × 28 columns
dt1
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | 0 | 224.00000 | 2018 | 10 | 2 | 0 | 0 | 0 | 65.00000 | 0.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 0 | 2 | 3 | 0 | 5.00000 | 2018 | 11 | 6 | 0 | 0 | 0 | 106.68000 | 1.00000 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 2 | 0 | 2 | 1 | 0 | 1.00000 | 2018 | 2 | 28 | 0 | 0 | 0 | 60.00000 | 0.00000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | 0 | 211.00000 | 2018 | 5 | 20 | 0 | 0 | 0 | 100.00000 | 0.00000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 2 | 0 | 1 | 1 | 0 | 48.00000 | 2018 | 4 | 11 | 0 | 0 | 0 | 94.50000 | 0.00000 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36270 | 2 | 0 | 2 | 6 | 0 | 85.00000 | 2018 | 8 | 3 | 0 | 0 | 0 | 167.80000 | 1.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36271 | 2 | 0 | 1 | 3 | 0 | 228.00000 | 2018 | 10 | 17 | 0 | 0 | 0 | 90.95000 | 2.00000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36272 | 2 | 0 | 2 | 6 | 0 | 148.00000 | 2018 | 7 | 1 | 0 | 0 | 0 | 98.39000 | 2.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36273 | 2 | 0 | 0 | 3 | 0 | 63.00000 | 2018 | 4 | 21 | 0 | 0 | 0 | 94.50000 | 0.00000 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 36274 | 2 | 0 | 1 | 2 | 0 | 207.00000 | 2018 | 12 | 30 | 0 | 0 | 0 | 161.67000 | 0.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
36275 rows × 28 columns
df.duplicated().sum()
10615
# Remove all duplicate rows
df = df.drop_duplicates()
df.duplicated().sum()
0
df.describe(include='all').T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 25660.00000 | 2.00000 | 0.00000 | 2.00000 | 2.00000 | 2.00000 | 2.00000 | 2.00000 |
| no_of_children | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_weekend_nights | 25660.00000 | 0.88581 | 0.88374 | 0.00000 | 0.00000 | 1.00000 | 2.00000 | 5.00000 |
| no_of_week_nights | 25660.00000 | 2.23313 | 1.36900 | 0.00000 | 1.00000 | 2.00000 | 3.00000 | 6.00000 |
| required_car_parking_space | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| lead_time | 25660.00000 | 66.21904 | 67.25142 | 0.00000 | 12.00000 | 44.00000 | 100.00000 | 289.50000 |
| arrival_year | 25660.00000 | 2018.00000 | 0.00000 | 2018.00000 | 2018.00000 | 2018.00000 | 2018.00000 | 2018.00000 |
| arrival_month | 25660.00000 | 7.32720 | 3.13823 | 1.00000 | 5.00000 | 8.00000 | 10.00000 | 12.00000 |
| arrival_date | 25660.00000 | 15.73566 | 8.85085 | 1.00000 | 8.00000 | 16.00000 | 23.00000 | 31.00000 |
| repeated_guest | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_previous_cancellations | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_previous_bookings_not_canceled | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| avg_price_per_room | 25660.00000 | 105.24561 | 33.81945 | 20.75000 | 80.75000 | 100.00000 | 127.00000 | 179.55000 |
| no_of_special_requests | 25660.00000 | 0.72697 | 0.77004 | 0.00000 | 0.00000 | 1.00000 | 1.00000 | 2.50000 |
| booking_status | 25660.00000 | 0.28757 | 0.45264 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 1.00000 |
| type_of_meal_plan_Meal Plan 2 | 25660.00000 | 0.04373 | 0.20449 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| type_of_meal_plan_Meal Plan 3 | 25660.00000 | 0.00019 | 0.01396 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| type_of_meal_plan_Not Selected | 25660.00000 | 0.17217 | 0.37754 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 2 | 25660.00000 | 0.02202 | 0.14675 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 3 | 25660.00000 | 0.00023 | 0.01529 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 4 | 25660.00000 | 0.20951 | 0.40697 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 5 | 25660.00000 | 0.00892 | 0.09405 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 6 | 25660.00000 | 0.03648 | 0.18748 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 7 | 25660.00000 | 0.00604 | 0.07749 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Complementary | 25660.00000 | 0.01325 | 0.11435 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Corporate | 25660.00000 | 0.05171 | 0.22145 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Offline | 25660.00000 | 0.15877 | 0.36547 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Online | 25660.00000 | 0.77241 | 0.41929 | 0.00000 | 1.00000 | 1.00000 | 1.00000 | 1.00000 |
dt1.duplicated().sum()
10615
# Remove all duplicate rows
dt1 = dt1.drop_duplicates()
dt1.duplicated().sum()
0
dt1.describe(include='all').T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 25660.00000 | 2.00000 | 0.00000 | 2.00000 | 2.00000 | 2.00000 | 2.00000 | 2.00000 |
| no_of_children | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_weekend_nights | 25660.00000 | 0.88581 | 0.88374 | 0.00000 | 0.00000 | 1.00000 | 2.00000 | 5.00000 |
| no_of_week_nights | 25660.00000 | 2.23313 | 1.36900 | 0.00000 | 1.00000 | 2.00000 | 3.00000 | 6.00000 |
| required_car_parking_space | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| lead_time | 25660.00000 | 66.21904 | 67.25142 | 0.00000 | 12.00000 | 44.00000 | 100.00000 | 289.50000 |
| arrival_year | 25660.00000 | 2018.00000 | 0.00000 | 2018.00000 | 2018.00000 | 2018.00000 | 2018.00000 | 2018.00000 |
| arrival_month | 25660.00000 | 7.32720 | 3.13823 | 1.00000 | 5.00000 | 8.00000 | 10.00000 | 12.00000 |
| arrival_date | 25660.00000 | 15.73566 | 8.85085 | 1.00000 | 8.00000 | 16.00000 | 23.00000 | 31.00000 |
| repeated_guest | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_previous_cancellations | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_previous_bookings_not_canceled | 25660.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| avg_price_per_room | 25660.00000 | 105.24561 | 33.81945 | 20.75000 | 80.75000 | 100.00000 | 127.00000 | 179.55000 |
| no_of_special_requests | 25660.00000 | 0.72697 | 0.77004 | 0.00000 | 0.00000 | 1.00000 | 1.00000 | 2.50000 |
| booking_status | 25660.00000 | 0.28757 | 0.45264 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 1.00000 |
| type_of_meal_plan_Meal Plan 2 | 25660.00000 | 0.04373 | 0.20449 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| type_of_meal_plan_Meal Plan 3 | 25660.00000 | 0.00019 | 0.01396 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| type_of_meal_plan_Not Selected | 25660.00000 | 0.17217 | 0.37754 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 2 | 25660.00000 | 0.02202 | 0.14675 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 3 | 25660.00000 | 0.00023 | 0.01529 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 4 | 25660.00000 | 0.20951 | 0.40697 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 5 | 25660.00000 | 0.00892 | 0.09405 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 6 | 25660.00000 | 0.03648 | 0.18748 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved_Room_Type 7 | 25660.00000 | 0.00604 | 0.07749 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Complementary | 25660.00000 | 0.01325 | 0.11435 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Corporate | 25660.00000 | 0.05171 | 0.22145 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Offline | 25660.00000 | 0.15877 | 0.36547 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| market_segment_type_Online | 25660.00000 | 0.77241 | 0.41929 | 0.00000 | 1.00000 | 1.00000 | 1.00000 | 1.00000 |
import statsmodels.api as sm
# Assigning variable to data without outliers treated
A = df.drop(["booking_status"], axis=1)
B = df["booking_status"]
# Assigning variable data with outliers treated
X = dt1.drop(["booking_status"], axis=1)
Y = dt1["booking_status"]
A = sm.add_constant(A)
X = sm.add_constant(X)
# Splitting data without outliers treated into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1
)
# Splitting data with outliers treated into train and test sets
A_train, A_test, B_train, B_test = train_test_split(
A, B, test_size=0.30, random_state=1
)
X.nunique()
no_of_adults 1 no_of_children 1 no_of_weekend_nights 6 no_of_week_nights 7 required_car_parking_space 1 lead_time 291 arrival_year 1 arrival_month 12 arrival_date 31 repeated_guest 1 no_of_previous_cancellations 1 no_of_previous_bookings_not_canceled 1 avg_price_per_room 3513 no_of_special_requests 4 type_of_meal_plan_Meal Plan 2 2 type_of_meal_plan_Meal Plan 3 2 type_of_meal_plan_Not Selected 2 room_type_reserved_Room_Type 2 2 room_type_reserved_Room_Type 3 2 room_type_reserved_Room_Type 4 2 room_type_reserved_Room_Type 5 2 room_type_reserved_Room_Type 6 2 room_type_reserved_Room_Type 7 2 market_segment_type_Complementary 2 market_segment_type_Corporate 2 market_segment_type_Offline 2 market_segment_type_Online 2 dtype: int64
print("Shape of Training set with outliers not treated : ", X_train.shape)
print("Shape of test set with outliers not treated : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of Training set with outliers not treated : (17962, 27) Shape of test set with outliers not treated : (7698, 27) Percentage of classes in training set: 0 0.71456 1 0.28544 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.70746 1 0.29254 Name: booking_status, dtype: float64
print("Shape of Training set with outliers treated : ", A_train.shape)
print("Shape of test set with outliers treated : ", A_test.shape)
print("Percentage of classes in training set:")
print(B_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(B_test.value_counts(normalize=True))
Shape of Training set with outliers treated : (17962, 27) Shape of test set with outliers treated : (7698, 27) Percentage of classes in training set: 0 0.71456 1 0.28544 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.70746 1 0.29254 Name: booking_status, dtype: float64
def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (15,10))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a triangle will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
# function to create labeled barplots
def labeled_barplot(data, feature, perc=True, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n],
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
### function to plot distributions wrt target
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
# Check for continuous data types
continuous_cols = df.select_dtypes(include=[np.number]) # Select numeric columns
# Loop through continuous columns
for col in continuous_cols.columns:
plt.figure() # Create a new figure for each plot
histogram_boxplot(df, col) # Pass the column name (feature)
# Extract the column title from the DataFrame
col_title = df.columns.get_loc(col) # Get the column index and use it to retrieve the title
plt.title(col_title) # Set the plot title using the column title
plt.show()
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
Observation
After pre-model process, their some minor changes to some features.
The entire data set now only contain new guest with the repeat guest mostly cleared due to being duplicates.
There are no more children included in any of the bookings.
There is also no previous booking cancelled because all the bookings remaining in the dataset are all from new guests.
# Check for categorical data types
categorical_cols = df.select_dtypes(include=['object']) # Select object columns (categorical data)
# Loop through categorical columns
for col in categorical_cols.columns:
plt.figure() # Create a new figure for each plot
labeled_barplot(df, col) # Plot bar chart of categorical variables
plt.show();
labeled_barplot(df,'booking_status', perc=True, n=None)
# creating a list of numerical columns
num_cols = df.select_dtypes(include=np.number).columns.tolist()
#plotting correlation heat map
plt.figure(figsize=(12, 7))
sns.heatmap(
df[num_cols].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
####Plotting Arrival month vs booking status.
plt.figure()
sns.countplot(df, x = 'arrival_month', hue ='booking_status', )
plt.show();
plt.figure()
sns.countplot(df, x= 'no_of_special_requests', hue ='booking_status', )
plt.show();
plt.figure()
sns.countplot(df, x = 'market_segment_type_Online', hue ='booking_status', )
plt.show();
plt.figure()
sns.boxplot(df, y = 'avg_price_per_room', x ='booking_status', )
plt.show();
plt.figure()
sns.boxplot(df, y = 'lead_time', x ='booking_status', )
plt.show();
from statsmodels.stats.outliers_influence import variance_inflation_factor
def checking_vif(predictors):
vif = pd.DataFrame()
vif["feature"] = predictors.columns
# calculating VIF for each feature
vif["VIF"] = [
variance_inflation_factor(predictors.values, i)
for i in range(len(predictors.columns))
]
return vif
checking_vif(X_train)
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_children | NaN |
| 2 | no_of_weekend_nights | 1.05382 |
| 3 | no_of_week_nights | 1.12324 |
| 4 | required_car_parking_space | NaN |
| 5 | lead_time | 1.16547 |
| 6 | arrival_year | 0.00000 |
| 7 | arrival_month | 1.05959 |
| 8 | arrival_date | 1.00444 |
| 9 | repeated_guest | NaN |
| 10 | no_of_previous_cancellations | NaN |
| 11 | no_of_previous_bookings_not_canceled | NaN |
| 12 | avg_price_per_room | 1.92621 |
| 13 | no_of_special_requests | 1.14705 |
| 14 | type_of_meal_plan_Meal Plan 2 | 1.08452 |
| 15 | type_of_meal_plan_Meal Plan 3 | 1.00836 |
| 16 | type_of_meal_plan_Not Selected | 1.21694 |
| 17 | room_type_reserved_Room_Type 2 | 1.03537 |
| 18 | room_type_reserved_Room_Type 3 | 1.00117 |
| 19 | room_type_reserved_Room_Type 4 | 1.29260 |
| 20 | room_type_reserved_Room_Type 5 | 1.03515 |
| 21 | room_type_reserved_Room_Type 6 | 1.27504 |
| 22 | room_type_reserved_Room_Type 7 | 1.05672 |
| 23 | market_segment_type_Complementary | 4.70633 |
| 24 | market_segment_type_Corporate | 14.84988 |
| 25 | market_segment_type_Offline | 38.33334 |
| 26 | market_segment_type_Online | 49.69074 |
checking_vif(A_train)
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_children | NaN |
| 2 | no_of_weekend_nights | 1.05382 |
| 3 | no_of_week_nights | 1.12324 |
| 4 | required_car_parking_space | NaN |
| 5 | lead_time | 1.16547 |
| 6 | arrival_year | 0.00000 |
| 7 | arrival_month | 1.05959 |
| 8 | arrival_date | 1.00444 |
| 9 | repeated_guest | NaN |
| 10 | no_of_previous_cancellations | NaN |
| 11 | no_of_previous_bookings_not_canceled | NaN |
| 12 | avg_price_per_room | 1.92621 |
| 13 | no_of_special_requests | 1.14705 |
| 14 | type_of_meal_plan_Meal Plan 2 | 1.08452 |
| 15 | type_of_meal_plan_Meal Plan 3 | 1.00836 |
| 16 | type_of_meal_plan_Not Selected | 1.21694 |
| 17 | room_type_reserved_Room_Type 2 | 1.03537 |
| 18 | room_type_reserved_Room_Type 3 | 1.00117 |
| 19 | room_type_reserved_Room_Type 4 | 1.29260 |
| 20 | room_type_reserved_Room_Type 5 | 1.03515 |
| 21 | room_type_reserved_Room_Type 6 | 1.27504 |
| 22 | room_type_reserved_Room_Type 7 | 1.05672 |
| 23 | market_segment_type_Complementary | 4.70633 |
| 24 | market_segment_type_Corporate | 14.84988 |
| 25 | market_segment_type_Offline | 38.33334 |
| 26 | market_segment_type_Online | 49.69074 |
Its is important feature with infinite VIF values are checked and removed so the the regression model will not be impacted negatively.
I will be dropping multicilinearity for the data frame where I treated outliers(A_TRAIN,B_TRAIN), as this is what I will be using to run the Logistic regression model.
-The data frame where I did not treat outliers(X_train,y_train) will not have its multicolinearity treated. This is because it will be used to run the decision tree model. Decision trees are resistant to outliers and multicolinearities.
#Defining a variable to remove multicoolinearity
def treating_multicollinearity(predictors, target, high_vif_columns):
"""
Checking the effect of dropping the columns showing high multicollinearity
on model performance (adj. R-squared and RMSE)
predictors: independent variables
target: dependent variable
high_vif_columns: columns having high VIF
"""
# empty lists to store adj. R-squared and RMSE values
adj_r2 = []
rmse = []
# build ols models by dropping one of the high VIF columns at a time
# store the adjusted R-squared and RMSE in the lists defined previously
for cols in high_vif_columns:
# defining the new train set
train = predictors.loc[:, ~predictors.columns.str.startswith(cols)]
# create the model
olsmodel = sm.OLS(target, train).fit()
# adding adj. R-squared and RMSE to the lists
adj_r2.append(olsmodel.rsquared_adj)
rmse.append(np.sqrt(olsmodel.mse_resid))
# creating a dataframe for the results
temp = pd.DataFrame(
{
"col": high_vif_columns,
"Adj. R-squared after_dropping col": adj_r2,
"RMSE after dropping col": rmse,
}
).sort_values(by="Adj. R-squared after_dropping col", ascending=False)
temp.reset_index(drop=True, inplace=True)
return temp
#checking impact on model when feature with multi colinearity is removed
col_list1 = ["no_of_children", "required_car_parking_space"]
res1 = treating_multicollinearity(A_train, B_train, col_list1)
res1
| col | Adj. R-squared after_dropping col | RMSE after dropping col | |
|---|---|---|---|
| 0 | no_of_children | 0.31587 | 0.37356 |
| 1 | required_car_parking_space | 0.31587 | 0.37356 |
#Dropping feature with multi colinearity
col_to_drop1 = "no_of_children"
A_train = A_train.drop(col_to_drop1, axis=1)
A_test = A_test.drop(col_to_drop1, axis=1)
# Check VIF now
vif = checking_vif(A_train)
print("VIF after dropping ", col_to_drop1)
vif
VIF after dropping no_of_children
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_weekend_nights | 1.05382 |
| 2 | no_of_week_nights | 1.12324 |
| 3 | required_car_parking_space | NaN |
| 4 | lead_time | 1.16547 |
| 5 | arrival_year | 0.00000 |
| 6 | arrival_month | 1.05959 |
| 7 | arrival_date | 1.00444 |
| 8 | repeated_guest | NaN |
| 9 | no_of_previous_cancellations | NaN |
| 10 | no_of_previous_bookings_not_canceled | NaN |
| 11 | avg_price_per_room | 1.92621 |
| 12 | no_of_special_requests | 1.14705 |
| 13 | type_of_meal_plan_Meal Plan 2 | 1.08452 |
| 14 | type_of_meal_plan_Meal Plan 3 | 1.00836 |
| 15 | type_of_meal_plan_Not Selected | 1.21694 |
| 16 | room_type_reserved_Room_Type 2 | 1.03537 |
| 17 | room_type_reserved_Room_Type 3 | 1.00117 |
| 18 | room_type_reserved_Room_Type 4 | 1.29260 |
| 19 | room_type_reserved_Room_Type 5 | 1.03515 |
| 20 | room_type_reserved_Room_Type 6 | 1.27504 |
| 21 | room_type_reserved_Room_Type 7 | 1.05672 |
| 22 | market_segment_type_Complementary | 4.70633 |
| 23 | market_segment_type_Corporate | 14.84988 |
| 24 | market_segment_type_Offline | 38.33334 |
| 25 | market_segment_type_Online | 49.69074 |
#checking impact on model when feature with multi colinearity is removed
col_list2 = ["repeated_guest", "required_car_parking_space"]
res2 = treating_multicollinearity(A_train, B_train, col_list2)
res2
| col | Adj. R-squared after_dropping col | RMSE after dropping col | |
|---|---|---|---|
| 0 | repeated_guest | 0.31587 | 0.37356 |
| 1 | required_car_parking_space | 0.31587 | 0.37356 |
#Dropping feature with multi colinearity
col_to_drop2 = "required_car_parking_space"
A_train = A_train.drop(col_to_drop2, axis=1)
A_test = A_test.drop(col_to_drop2, axis=1)
# Check VIF now
vif = checking_vif(A_train)
print("VIF after dropping ", col_to_drop2)
vif
VIF after dropping required_car_parking_space
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_weekend_nights | 1.05382 |
| 2 | no_of_week_nights | 1.12324 |
| 3 | lead_time | 1.16547 |
| 4 | arrival_year | 0.00000 |
| 5 | arrival_month | 1.05959 |
| 6 | arrival_date | 1.00444 |
| 7 | repeated_guest | NaN |
| 8 | no_of_previous_cancellations | NaN |
| 9 | no_of_previous_bookings_not_canceled | NaN |
| 10 | avg_price_per_room | 1.92621 |
| 11 | no_of_special_requests | 1.14705 |
| 12 | type_of_meal_plan_Meal Plan 2 | 1.08452 |
| 13 | type_of_meal_plan_Meal Plan 3 | 1.00836 |
| 14 | type_of_meal_plan_Not Selected | 1.21694 |
| 15 | room_type_reserved_Room_Type 2 | 1.03537 |
| 16 | room_type_reserved_Room_Type 3 | 1.00117 |
| 17 | room_type_reserved_Room_Type 4 | 1.29260 |
| 18 | room_type_reserved_Room_Type 5 | 1.03515 |
| 19 | room_type_reserved_Room_Type 6 | 1.27504 |
| 20 | room_type_reserved_Room_Type 7 | 1.05672 |
| 21 | market_segment_type_Complementary | 4.70633 |
| 22 | market_segment_type_Corporate | 14.84988 |
| 23 | market_segment_type_Offline | 38.33334 |
| 24 | market_segment_type_Online | 49.69074 |
#Dropping feature with multi colinearity
col_to_drop3 = "no_of_previous_bookings_not_canceled"
A_train = A_train.drop(col_to_drop3, axis=1)
A_test = A_test.drop(col_to_drop3, axis=1)
# Check VIF now
vif = checking_vif(A_train)
print("VIF after dropping ", col_to_drop3)
vif
VIF after dropping no_of_previous_bookings_not_canceled
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_weekend_nights | 1.05382 |
| 2 | no_of_week_nights | 1.12324 |
| 3 | lead_time | 1.16547 |
| 4 | arrival_year | 0.00000 |
| 5 | arrival_month | 1.05959 |
| 6 | arrival_date | 1.00444 |
| 7 | repeated_guest | NaN |
| 8 | no_of_previous_cancellations | NaN |
| 9 | avg_price_per_room | 1.92621 |
| 10 | no_of_special_requests | 1.14705 |
| 11 | type_of_meal_plan_Meal Plan 2 | 1.08452 |
| 12 | type_of_meal_plan_Meal Plan 3 | 1.00836 |
| 13 | type_of_meal_plan_Not Selected | 1.21694 |
| 14 | room_type_reserved_Room_Type 2 | 1.03537 |
| 15 | room_type_reserved_Room_Type 3 | 1.00117 |
| 16 | room_type_reserved_Room_Type 4 | 1.29260 |
| 17 | room_type_reserved_Room_Type 5 | 1.03515 |
| 18 | room_type_reserved_Room_Type 6 | 1.27504 |
| 19 | room_type_reserved_Room_Type 7 | 1.05672 |
| 20 | market_segment_type_Complementary | 4.70633 |
| 21 | market_segment_type_Corporate | 14.84988 |
| 22 | market_segment_type_Offline | 38.33334 |
| 23 | market_segment_type_Online | 49.69074 |
#Dropping feature with multi colinearity
col_to_drop4 = "repeated_guest"
A_train = A_train.drop(col_to_drop4, axis=1)
A_test = A_test.drop(col_to_drop4, axis=1)
# Check VIF now
vif = checking_vif(A_train)
print("VIF after dropping ", col_to_drop4)
vif
VIF after dropping repeated_guest
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_weekend_nights | 1.05382 |
| 2 | no_of_week_nights | 1.12324 |
| 3 | lead_time | 1.16547 |
| 4 | arrival_year | 0.00000 |
| 5 | arrival_month | 1.05959 |
| 6 | arrival_date | 1.00444 |
| 7 | no_of_previous_cancellations | NaN |
| 8 | avg_price_per_room | 1.92621 |
| 9 | no_of_special_requests | 1.14705 |
| 10 | type_of_meal_plan_Meal Plan 2 | 1.08452 |
| 11 | type_of_meal_plan_Meal Plan 3 | 1.00836 |
| 12 | type_of_meal_plan_Not Selected | 1.21694 |
| 13 | room_type_reserved_Room_Type 2 | 1.03537 |
| 14 | room_type_reserved_Room_Type 3 | 1.00117 |
| 15 | room_type_reserved_Room_Type 4 | 1.29260 |
| 16 | room_type_reserved_Room_Type 5 | 1.03515 |
| 17 | room_type_reserved_Room_Type 6 | 1.27504 |
| 18 | room_type_reserved_Room_Type 7 | 1.05672 |
| 19 | market_segment_type_Complementary | 4.70633 |
| 20 | market_segment_type_Corporate | 14.84988 |
| 21 | market_segment_type_Offline | 38.33334 |
| 22 | market_segment_type_Online | 49.69074 |
#Dropping feature with multi colinearity
col_to_drop5 = "no_of_previous_cancellations"
A_train = A_train.drop(col_to_drop5, axis=1)
A_test = A_test.drop(col_to_drop5, axis=1)
# Check VIF now
vif = checking_vif(A_train)
print("VIF after dropping ", col_to_drop5)
vif
VIF after dropping no_of_previous_cancellations
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_weekend_nights | 1.05382 |
| 2 | no_of_week_nights | 1.12324 |
| 3 | lead_time | 1.16547 |
| 4 | arrival_year | 0.00000 |
| 5 | arrival_month | 1.05959 |
| 6 | arrival_date | 1.00444 |
| 7 | avg_price_per_room | 1.92621 |
| 8 | no_of_special_requests | 1.14705 |
| 9 | type_of_meal_plan_Meal Plan 2 | 1.08452 |
| 10 | type_of_meal_plan_Meal Plan 3 | 1.00836 |
| 11 | type_of_meal_plan_Not Selected | 1.21694 |
| 12 | room_type_reserved_Room_Type 2 | 1.03537 |
| 13 | room_type_reserved_Room_Type 3 | 1.00117 |
| 14 | room_type_reserved_Room_Type 4 | 1.29260 |
| 15 | room_type_reserved_Room_Type 5 | 1.03515 |
| 16 | room_type_reserved_Room_Type 6 | 1.27504 |
| 17 | room_type_reserved_Room_Type 7 | 1.05672 |
| 18 | market_segment_type_Complementary | 4.70633 |
| 19 | market_segment_type_Corporate | 14.84988 |
| 20 | market_segment_type_Offline | 38.33334 |
| 21 | market_segment_type_Online | 49.69074 |
#checking impact on model when feature with multi colinearity is removed
col_list3 = ["market_segment_type_Offline", "market_segment_type_Online"]
res3 = treating_multicollinearity(A_train, B_train, col_list3)
res3
| col | Adj. R-squared after_dropping col | RMSE after dropping col | |
|---|---|---|---|
| 0 | market_segment_type_Online | 0.31571 | 0.37360 |
| 1 | market_segment_type_Offline | 0.31534 | 0.37370 |
#Dropping feature with multi colinearity
col_to_drop6 = "market_segment_type_Offline"
A_train = A_train.drop(col_to_drop6, axis=1)
A_test = A_test.drop(col_to_drop6, axis=1)
# Check VIF now
vif = checking_vif(A_train)
print("VIF after dropping ", col_to_drop6)
vif
VIF after dropping market_segment_type_Offline
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 0.00000 |
| 1 | no_of_weekend_nights | 1.05376 |
| 2 | no_of_week_nights | 1.12317 |
| 3 | lead_time | 1.16100 |
| 4 | arrival_year | 0.00000 |
| 5 | arrival_month | 1.05950 |
| 6 | arrival_date | 1.00437 |
| 7 | avg_price_per_room | 1.92540 |
| 8 | no_of_special_requests | 1.14605 |
| 9 | type_of_meal_plan_Meal Plan 2 | 1.08341 |
| 10 | type_of_meal_plan_Meal Plan 3 | 1.00835 |
| 11 | type_of_meal_plan_Not Selected | 1.21693 |
| 12 | room_type_reserved_Room_Type 2 | 1.03532 |
| 13 | room_type_reserved_Room_Type 3 | 1.00116 |
| 14 | room_type_reserved_Room_Type 4 | 1.29007 |
| 15 | room_type_reserved_Room_Type 5 | 1.03505 |
| 16 | room_type_reserved_Room_Type 6 | 1.27502 |
| 17 | room_type_reserved_Room_Type 7 | 1.05671 |
| 18 | market_segment_type_Complementary | 1.25006 |
| 19 | market_segment_type_Corporate | 1.34199 |
| 20 | market_segment_type_Online | 1.72943 |
A_test.shape
(7698, 21)
# fitting logistic regression model
logit = sm.Logit(B_train, A_train.astype(float))
lg = logit.fit(disp=False)
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 17962
Model: Logit Df Residuals: 17941
Method: MLE Df Model: 20
Date: Sat, 16 Mar 2024 Pseudo R-squ.: 0.3072
Time: 23:45:18 Log-Likelihood: -7441.7
converged: False LL-Null: -10742.
Covariance Type: nonrobust LLR p-value: 0.000
=====================================================================================================
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------------------------------
no_of_adults -2.324e-06 nan nan nan nan nan
no_of_weekend_nights 0.0982 0.023 4.188 0.000 0.052 0.144
no_of_week_nights 0.0801 0.016 5.134 0.000 0.050 0.111
lead_time 0.0164 0.000 46.406 0.000 0.016 0.017
arrival_year -0.0025 nan nan nan nan nan
arrival_month -0.0506 0.007 -7.022 0.000 -0.065 -0.036
arrival_date -0.0010 0.002 -0.439 0.660 -0.006 0.004
avg_price_per_room 0.0191 0.001 21.637 0.000 0.017 0.021
no_of_special_requests -1.5065 0.034 -44.698 0.000 -1.573 -1.440
type_of_meal_plan_Meal Plan 2 -0.0957 0.105 -0.911 0.362 -0.302 0.110
type_of_meal_plan_Meal Plan 3 16.5861 1941.805 0.009 0.993 -3789.282 3822.454
type_of_meal_plan_Not Selected 0.3716 0.057 6.467 0.000 0.259 0.484
room_type_reserved_Room_Type 2 -0.3200 0.138 -2.324 0.020 -0.590 -0.050
room_type_reserved_Room_Type 3 -0.7997 1.802 -0.444 0.657 -4.332 2.733
room_type_reserved_Room_Type 4 -0.0808 0.057 -1.428 0.153 -0.192 0.030
room_type_reserved_Room_Type 5 -0.5014 0.233 -2.156 0.031 -0.957 -0.046
room_type_reserved_Room_Type 6 -0.3529 0.117 -3.011 0.003 -0.583 -0.123
room_type_reserved_Room_Type 7 -0.3479 0.292 -1.190 0.234 -0.921 0.225
market_segment_type_Complementary -28.8719 7813.339 -0.004 0.997 -1.53e+04 1.53e+04
market_segment_type_Corporate 1.2531 0.142 8.802 0.000 0.974 1.532
market_segment_type_Online 2.1726 0.078 27.746 0.000 2.019 2.326
=====================================================================================================
import statsmodels.api as sm
# initial list of columns
predictors = A_train.copy()
cols = predictors.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
A_train_aux = predictors[cols]
# fitting the model
model = sm.Logit(B_train, A_train_aux.astype(float))
model = model.fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = p_values.max()
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['no_of_weekend_nights', 'no_of_week_nights', 'lead_time', 'arrival_year', 'arrival_month', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'market_segment_type_Corporate', 'market_segment_type_Online']
A_train2 = A_train[selected_features]
A_test2 = A_test[selected_features]
# fitting logistic regression model
logit2 = sm.Logit(B_train, A_train2.astype(float))
lg2 = logit2.fit(disp=False)
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 17962
Model: Logit Df Residuals: 17949
Method: MLE Df Model: 12
Date: Sat, 16 Mar 2024 Pseudo R-squ.: 0.3067
Time: 23:47:17 Log-Likelihood: -7446.7
converged: True LL-Null: -10742.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
no_of_weekend_nights 0.0967 0.023 4.128 0.000 0.051 0.143
no_of_week_nights 0.0773 0.015 4.988 0.000 0.047 0.108
lead_time 0.0164 0.000 46.611 0.000 0.016 0.017
arrival_year -0.0025 6.1e-05 -40.768 0.000 -0.003 -0.002
arrival_month -0.0498 0.007 -6.942 0.000 -0.064 -0.036
avg_price_per_room 0.0185 0.001 23.337 0.000 0.017 0.020
no_of_special_requests -1.5070 0.034 -44.718 0.000 -1.573 -1.441
type_of_meal_plan_Not Selected 0.3919 0.056 6.995 0.000 0.282 0.502
room_type_reserved_Room_Type 2 -0.3006 0.137 -2.198 0.028 -0.569 -0.033
room_type_reserved_Room_Type 5 -0.4658 0.231 -2.012 0.044 -0.920 -0.012
room_type_reserved_Room_Type 6 -0.2917 0.111 -2.634 0.008 -0.509 -0.075
market_segment_type_Corporate 1.2690 0.142 8.961 0.000 0.991 1.547
market_segment_type_Online 2.1820 0.077 28.437 0.000 2.032 2.332
==================================================================================================
converting coeficient to odds on training data.
# converting coefficients to odds
odds = np.exp(lg2.params)
# finding the percentage change
perc_change_odds = (np.exp(lg2.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index= A_train2.columns).T
| no_of_weekend_nights | no_of_week_nights | lead_time | arrival_year | arrival_month | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | market_segment_type_Corporate | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 1.10149 | 1.08037 | 1.01651 | 0.99752 | 0.95145 | 1.01866 | 0.22157 | 1.47979 | 0.74036 | 0.62762 | 0.74699 | 3.55721 | 8.86423 |
| Change_odd% | 10.14875 | 8.03692 | 1.65079 | -0.24843 | -4.85468 | 1.86632 | -77.84298 | 47.97938 | -25.96418 | -37.23799 | -25.30128 | 255.72052 | 786.42339 |
# converting coefficients to odds
odds = np.exp(lg2.params)
# finding the percentage change
perc_change_odds = (np.exp(lg2.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index= A_test2.columns).T
| no_of_weekend_nights | no_of_week_nights | lead_time | arrival_year | arrival_month | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | market_segment_type_Corporate | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 1.10149 | 1.08037 | 1.01651 | 0.99752 | 0.95145 | 1.01866 | 0.22157 | 1.47979 | 0.74036 | 0.62762 | 0.74699 | 3.55721 | 8.86423 |
| Change_odd% | 10.14875 | 8.03692 | 1.65079 | -0.24843 | -4.85468 | 1.86632 | -77.84298 | 47.97938 | -25.96418 | -37.23799 | -25.30128 | 255.72052 | 786.42339 |
From this model:
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
log_reg_model_train_perf = confusion_matrix_statsmodels(lg2, A_train2, B_train)
log_reg_model_train_perf
print("Training performance:")
Logistic_regression_model_train_2 = model_performance_classification_statsmodels(lg2, A_train2, B_train)
Logistic_regression_model_train_2
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.81155 | 0.56290 | 0.71613 | 0.63034 |
log_reg_model_test_perf = confusion_matrix_statsmodels(lg2, A_test2, B_test)
print("Test performance:")
Logistic_regression_model_test_2 = model_performance_classification_statsmodels(lg2, A_test2, B_test)
Logistic_regression_model_test_2
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80060 | 0.56261 | 0.69730 | 0.62276 |
y_scores = lg2.predict(A_train2)
prec, rec, tre = precision_recall_curve(B_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
At the threshold of 0.65 we get balanced precision and recall
# setting the threshold
optimal_threshold_curve = 0.64
Checking model performance on training set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, A_train2, B_train, threshold=optimal_threshold_curve)
Logistic_regression_model_train_3 = model_performance_classification_statsmodels(
lg2, A_train2, B_train, threshold=optimal_threshold_curve
)
print("Training performance:")
Logistic_regression_model_train_3
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79618 | 0.38853 | 0.79110 | 0.52112 |
Checking model performance on test data
# creating confusion matrix
confusion_matrix_statsmodels(lg2, A_test2, y_test, threshold=optimal_threshold_curve)
Logistic_regression_model_test_3 = model_performance_classification_statsmodels(
lg2, A_test2, B_test, threshold=optimal_threshold_curve
)
print("Test performance:")
Logistic_regression_model_test_3
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.78748 | 0.38677 | 0.77353 | 0.51569 |
checking performance on training set with AUC recommendation.
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(B_train, lg2.predict(A_train2))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.29901073654427324
Checking model performance on training set
# creating confusion matrix
confusion_matrix_statsmodels(
lg2, A_train2, B_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
Logistic_regression_model_train_4 = model_performance_classification_statsmodels(
lg2, A_train2, B_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
Logistic_regression_model_train_4
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.78221 | 0.76204 | 0.59206 | 0.66638 |
Checking model performance on Test data.
logit_roc_auc_train = roc_auc_score(B_test, lg2.predict(A_test2))
fpr, tpr, thresholds = roc_curve(B_test, lg2.predict(A_test2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# creating confusion matrix
confusion_matrix_statsmodels(lg2, A_test2, B_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
Logistic_regression_model_test_4 = model_performance_classification_statsmodels(
lg2, A_test2, B_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
Logistic_regression_model_test_4
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.76929 | 0.75533 | 0.58134 | 0.65701 |
# training performance comparison
models_train_comp_df = pd.concat(
[
Logistic_regression_model_train_2 .T,
Logistic_regression_model_test_3.T,
Logistic_regression_model_train_4.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression-default Threshold (0.5)",
"Logistic Regression-0.64 Threshold",
"Logistic Regression-0.29 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression-default Threshold (0.5) | Logistic Regression-0.64 Threshold | Logistic Regression-0.29 Threshold | |
|---|---|---|---|
| Accuracy | 0.81155 | 0.78748 | 0.78221 |
| Recall | 0.56290 | 0.38677 | 0.76204 |
| Precision | 0.71613 | 0.77353 | 0.59206 |
| F1 | 0.63034 | 0.51569 | 0.66638 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
Logistic_regression_model_test_2.T,
Logistic_regression_model_test_3.T,
Logistic_regression_model_test_4.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression-default Threshold (0.5)",
"Logistic Regression-0.64 Threshold",
"Logistic Regression-0.29 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression-default Threshold (0.5) | Logistic Regression-0.64 Threshold | Logistic Regression-0.29 Threshold | |
|---|---|---|---|
| Accuracy | 0.80060 | 0.78748 | 0.76929 |
| Recall | 0.56261 | 0.38677 | 0.75533 |
| Precision | 0.69730 | 0.77353 | 0.58134 |
| F1 | 0.62276 | 0.51569 | 0.65701 |
Insight
The final regression model selected uses a threshold of 0.29.
# Set the threshold
threshold = 0.29
# Convert coefficients to odds
odds = np.exp(lg2.params)
# Filter coefficients that are above the threshold
significant_odds = odds[odds > threshold]
# Finding the percentage change for significant coefficients
perc_change_odds = (significant_odds - 1) * 100
# Create a DataFrame with significant coefficients
significant_coefficients_df = pd.DataFrame({"Odds": significant_odds, "Change_odd%": perc_change_odds}, index=A_train2.columns)
# Display the DataFrame
significant_coefficients_df.T
| no_of_weekend_nights | no_of_week_nights | lead_time | arrival_year | arrival_month | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | market_segment_type_Corporate | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 1.10149 | 1.08037 | 1.01651 | 0.99752 | 0.95145 | 1.01866 | NaN | 1.47979 | 0.74036 | 0.62762 | 0.74699 | 3.55721 | 8.86423 |
| Change_odd% | 10.14875 | 8.03692 | 1.65079 | -0.24843 | -4.85468 | 1.86632 | NaN | 47.97938 | -25.96418 | -37.23799 | -25.30128 | 255.72052 | 786.42339 |
# Set the threshold
threshold = 0.29
# Convert coefficients to odds
odds = np.exp(lg2.params)
# Filter coefficients that are above the threshold
significant_odds = odds[odds > threshold]
# Finding the percentage change for significant coefficients
perc_change_odds = (significant_odds - 1) * 100
# Create a DataFrame with significant coefficients
significant_coefficients_df = pd.DataFrame({"Odds": significant_odds, "Change_odd%": perc_change_odds}, index=A_test2.columns)
# Display the DataFrame
significant_coefficients_df.T
| no_of_weekend_nights | no_of_week_nights | lead_time | arrival_year | arrival_month | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | market_segment_type_Corporate | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 1.10149 | 1.08037 | 1.01651 | 0.99752 | 0.95145 | 1.01866 | NaN | 1.47979 | 0.74036 | 0.62762 | 0.74699 | 3.55721 | 8.86423 |
| Change_odd% | 10.14875 | 8.03692 | 1.65079 | -0.24843 | -4.85468 | 1.86632 | NaN | 47.97938 | -25.96418 | -37.23799 | -25.30128 | 255.72052 | 786.42339 |
Interpretations
From this model:
#Building default decision tree
model0 = DecisionTreeClassifier(random_state=1)
model0.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(random_state=1)
The metrics l I will be using to evaluate the performance of this model will be the Recall and F1 score. Recall, because making predictions bases on false negative will lead to generalizing on unseen data based on features that may affect people who do not cancel booking rather than those who actually cancelled booking,thus leading to further business losses. Also, Fi score gives a more balanced assessment of the performance of the model.
# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
"""
# predicting using the independent variables
pred = model.predict(predictors)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
confusion_matrix_sklearn(model0, X_train, y_train)
decision_tree_perf_train_without = model_performance_classification_sklearn(
model0, X_train, y_train
)
decision_tree_perf_train_without
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.99655 | 0.98791 | 1.00000 | 0.99392 |
confusion_matrix_sklearn(model0, X_test, y_test)
decision_tree_perf_test_without = model_performance_classification_sklearn(
model0, X_test, y_test
)
decision_tree_perf_test_without
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79150 | 0.64298 | 0.64384 | 0.64341 |
This model is not able to generalize well on unseen data because the gap between in the training and testing set in terms of Recall and F1 score is markedly large.
model1 = DecisionTreeClassifier(random_state=1, class_weight="balanced")
model1.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(class_weight='balanced', random_state=1)
confusion_matrix_sklearn(model1, X_train, y_train)
decision_tree_perf_train = model_performance_classification_sklearn(
model1, X_train, y_train
)
decision_tree_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.99655 | 1.00000 | 0.98805 | 0.99399 |
confusion_matrix_sklearn(model1, X_test, y_test)
decision_tree_perf_test = model_performance_classification_sklearn(
model1, X_test, y_test
)
decision_tree_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79488 | 0.63988 | 0.65233 | 0.64604 |
Like the default decision tree model,this model is not able to generalize well on unseen data because the gap between in the training and testing set in terms of Recall and F1 score is markedly large.
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {
"class_weight": [None, "balanced"],
"max_depth": np.arange(2, 7, 2),
"max_leaf_nodes": [50, 75, 150, 250],
"min_samples_split": [10, 30, 50, 70],
}
# Type of scoring used to compare parameter combinations
acc_scorer = make_scorer(recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
model2 = estimator
# Fit the best algorithm to the data.
model2.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', max_depth=2, max_leaf_nodes=50,
min_samples_split=10, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. DecisionTreeClassifier(class_weight='balanced', max_depth=2, max_leaf_nodes=50,
min_samples_split=10, random_state=1)#Visualizing confusion matrix for training data
confusion_matrix_sklearn(model2, X_train, y_train)
decision_tree_tune_perf_train = model_performance_classification_sklearn(
model2, X_train, y_train
)
decision_tree_tune_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.64681 | 0.79735 | 0.43522 | 0.56309 |
#Visualizing confusion matrix for test data
confusion_matrix_sklearn(model2, X_test, y_test)
decision_tree_tune_perf_test = model_performance_classification_sklearn(
model2, X_test, y_test
)
decision_tree_tune_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.64315 | 0.80373 | 0.43985 | 0.56856 |
# Writing fubction to visualize decision tree built
feature_names = list(X_train.columns)
importances = model1.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
model1,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
importances = model1.feature_importances_
importances
array([0.00000000e+00, 0.00000000e+00, 3.94350680e-02, 5.74059421e-02,
0.00000000e+00, 3.15897423e-01, 0.00000000e+00, 7.92628968e-02,
9.53611457e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.38250333e-01, 1.22261786e-01, 3.64648127e-03, 0.00000000e+00,
1.18380955e-02, 2.74383483e-03, 0.00000000e+00, 1.24454264e-02,
6.71302943e-04, 2.75637334e-03, 3.02490718e-04, 1.29826380e-04,
1.44599624e-03, 8.18416572e-04, 1.15327162e-01])
# Text report showing the rules of a decision tree -
print(tree.export_text(model1, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 92.50 | | | | |--- no_of_weekend_nights <= 3.50 | | | | | |--- arrival_month <= 9.50 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- avg_price_per_room <= 178.44 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 91.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 91.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [247.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 178.44 | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | |--- lead_time <= 37.50 | | | | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | | | | | |--- lead_time > 37.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- lead_time <= 41.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- avg_price_per_room <= 75.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 75.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- lead_time > 41.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- avg_price_per_room <= 65.20 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 65.20 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- weights: [15.39, 0.00] class: 0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- lead_time <= 27.50 | | | | | | | | |--- avg_price_per_room <= 79.10 | | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | | |--- arrival_date <= 7.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 7.00 | | | | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | | |--- lead_time <= 19.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- lead_time > 19.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 79.10 | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | |--- avg_price_per_room <= 88.10 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 88.10 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- lead_time > 27.50 | | | | | | | | |--- avg_price_per_room <= 97.50 | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | |--- weights: [4.90, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- avg_price_per_room > 97.50 | | | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_date > 24.00 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | |--- arrival_month > 9.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | |--- avg_price_per_room <= 68.25 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [13.29, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- weights: [27.99, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 68.25 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- avg_price_per_room <= 69.75 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | | |--- avg_price_per_room > 69.75 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | |--- lead_time <= 65.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | | |--- weights: [135.75, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- lead_time > 65.50 | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [165.14, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 3.50 | | | | | |--- market_segment_type_Complementary <= 0.50 | | | | | | |--- lead_time <= 9.00 | | | | | | | |--- weights: [0.00, 8.76] class: 1 | | | | | | |--- lead_time > 9.00 | | | | | | | |--- lead_time <= 10.50 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- lead_time > 10.50 | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | |--- market_segment_type_Complementary > 0.50 | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | |--- lead_time > 92.50 | | | | |--- avg_price_per_room <= 113.53 | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | |--- weights: [16.09, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | |--- lead_time <= 116.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- avg_price_per_room <= 61.12 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 61.12 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | |--- lead_time > 116.50 | | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | | | |--- avg_price_per_room <= 75.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 75.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | | | |--- lead_time <= 146.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 146.00 | | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | | |--- weights: [16.79, 0.00] class: 0 | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | |--- avg_price_per_room <= 101.02 | | | | | | | |--- weights: [27.29, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 101.02 | | | | | | | |--- avg_price_per_room <= 104.08 | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | |--- avg_price_per_room > 104.08 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | |--- avg_price_per_room > 113.53 | | | | | |--- avg_price_per_room <= 138.25 | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | |--- lead_time <= 119.50 | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | |--- lead_time > 119.50 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | |--- arrival_month > 7.00 | | | | | | | | |--- weights: [0.00, 14.01] class: 1 | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- avg_price_per_room <= 130.54 | | | | | | | | | |--- arrival_date <= 12.00 | | | | | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | |--- arrival_date > 12.00 | | | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | | | |--- avg_price_per_room > 130.54 | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | |--- avg_price_per_room > 138.25 | | | | | | |--- avg_price_per_room <= 166.72 | | | | | | | |--- weights: [4.20, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 166.72 | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 3.50 | | | | |--- avg_price_per_room <= 135.83 | | | | | |--- no_of_week_nights <= 3.50 | | | | | | |--- arrival_date <= 15.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [23.79, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- avg_price_per_room <= 117.60 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- weights: [35.69, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 117.60 | | | | | | | | | | |--- avg_price_per_room <= 118.90 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | | |--- avg_price_per_room > 118.90 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | |--- arrival_date > 15.50 | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | |--- weights: [55.98, 0.00] class: 0 | | | | | | | |--- arrival_date > 22.50 | | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | | |--- weights: [27.29, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- avg_price_per_room <= 91.10 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 91.10 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- no_of_week_nights > 3.50 | | | | | | |--- lead_time <= 2.50 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- weights: [8.40, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | |--- lead_time > 2.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- avg_price_per_room <= 110.33 | | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | | |--- weights: [0.00, 7.01] class: 1 | | | | | | | | |--- avg_price_per_room > 110.33 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | |--- avg_price_per_room > 135.83 | | | | | |--- lead_time <= 2.50 | | | | | | |--- avg_price_per_room <= 178.78 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- avg_price_per_room <= 161.50 | | | | | | | | | |--- arrival_date <= 29.00 | | | | | | | | | | |--- avg_price_per_room <= 136.09 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | | |--- avg_price_per_room > 136.09 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- arrival_date > 29.00 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 161.50 | | | | | | | | | |--- avg_price_per_room <= 162.75 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 162.75 | | | | | | | | | | |--- weights: [9.10, 0.00] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- weights: [31.49, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 178.78 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [5.60, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | |--- weights: [0.00, 8.76] class: 1 | | | | | |--- lead_time > 2.50 | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- weights: [0.00, 7.01] class: 1 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- weights: [0.00, 12.26] class: 1 | | | | | | | |--- arrival_month > 9.00 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- no_of_week_nights <= 2.00 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 2.00 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | |--- lead_time > 3.50 | | | | |--- avg_price_per_room <= 105.28 | | | | | |--- lead_time <= 24.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [52.48, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [77.67, 0.00] class: 0 | | | | | |--- lead_time > 24.50 | | | | | | |--- avg_price_per_room <= 56.55 | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- lead_time <= 51.50 | | | | | | | | | | |--- lead_time <= 42.50 | | | | | | | | | | | |--- weights: [4.20, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 42.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 51.50 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- avg_price_per_room <= 27.98 | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 27.98 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- arrival_date > 15.50 | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | |--- weights: [20.29, 0.00] class: 0 | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | |--- lead_time <= 101.50 | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | |--- lead_time > 101.50 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | |--- avg_price_per_room > 56.55 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 72.78 | | | | | | | | | |--- lead_time <= 42.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [16.79, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 42.50 | | | | | | | | | | |--- avg_price_per_room <= 68.10 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 68.10 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- avg_price_per_room > 72.78 | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- avg_price_per_room <= 62.09 | | | | | | | | | |--- avg_price_per_room <= 57.74 | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | |--- avg_price_per_room > 57.74 | | | | | | | | | | |--- weights: [4.20, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 62.09 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- avg_price_per_room <= 64.01 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 64.01 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 62.78 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 62.78 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | |--- avg_price_per_room > 105.28 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- lead_time <= 9.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 108.28 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 108.28 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- avg_price_per_room <= 106.49 | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 106.49 | | | | | | | | | | |--- weights: [0.00, 28.03] class: 1 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- avg_price_per_room <= 161.83 | | | | | | | | | |--- lead_time <= 4.50 | | | | | | | | | | |--- arrival_date <= 21.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 21.00 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | |--- lead_time > 4.50 | | | | | | | | | | |--- arrival_date <= 17.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 17.00 | | | | | | | | | | | |--- weights: [9.10, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 161.83 | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 12.26] class: 1 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | |--- lead_time > 9.50 | | | | | | | |--- avg_price_per_room <= 167.42 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 167.02 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 32 | | | | | | | | | |--- avg_price_per_room > 167.02 | | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | |--- weights: [4.90, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | |--- avg_price_per_room <= 149.02 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 149.02 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 167.42 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | |--- arrival_month > 10.50 | | | | | | |--- lead_time <= 22.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- lead_time <= 7.00 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | |--- lead_time > 7.00 | | | | | | | | | | |--- lead_time <= 19.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 19.50 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | |--- weights: [32.19, 0.00] class: 0 | | | | | | |--- lead_time > 22.50 | | | | | | | |--- avg_price_per_room <= 169.05 | | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | | |--- lead_time <= 46.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 46.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | | |--- lead_time <= 50.50 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- lead_time > 50.50 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 169.05 | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | |--- weights: [5.60, 0.00] class: 0 | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 6.50 | | | | |--- no_of_week_nights <= 5.50 | | | | | |--- avg_price_per_room <= 80.90 | | | | | | |--- weights: [206.42, 0.00] class: 0 | | | | | |--- avg_price_per_room > 80.90 | | | | | | |--- arrival_date <= 19.50 | | | | | | | |--- avg_price_per_room <= 153.50 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- avg_price_per_room <= 140.00 | | | | | | | | | | |--- avg_price_per_room <= 100.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 100.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 140.00 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [105.66, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 153.50 | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [24.49, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | |--- arrival_date > 19.50 | | | | | | | |--- avg_price_per_room <= 87.01 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 87.01 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- weights: [134.35, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- weights: [17.49, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- avg_price_per_room <= 133.17 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [27.99, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 133.17 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | |--- no_of_week_nights > 5.50 | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | |--- arrival_month <= 1.50 | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | |--- arrival_month > 1.50 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- weights: [0.00, 7.01] class: 1 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | |--- lead_time > 6.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 91.50 | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | |--- weights: [270.79, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | |--- avg_price_per_room <= 132.00 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- avg_price_per_room > 132.00 | | | | | | | | | |--- weights: [11.90, 0.00] class: 0 | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | |--- avg_price_per_room <= 97.50 | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 97.50 | | | | | | | | |--- lead_time <= 12.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- lead_time > 12.50 | | | | | | | | | |--- lead_time <= 42.50 | | | | | | | | | | |--- weights: [0.70, 1.75] class: 1 | | | | | | | | | |--- lead_time > 42.50 | | | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | |--- lead_time > 91.50 | | | | | | |--- arrival_month <= 3.50 | | | | | | | |--- weights: [12.60, 0.00] class: 0 | | | | | | |--- arrival_month > 3.50 | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | |--- avg_price_per_room <= 65.88 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- lead_time <= 94.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 94.50 | | | | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 65.88 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 11.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 11.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 16.50 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- lead_time <= 142.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 142.50 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- weights: [13.29, 0.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 118.74 | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | |--- lead_time <= 42.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [62.28, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [86.07, 0.00] class: 0 | | | | | | | |--- lead_time > 42.50 | | | | | | | | |--- avg_price_per_room <= 66.49 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | | |--- weights: [13.29, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [42.68, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [4.90, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 66.49 | | | | | | | | | |--- avg_price_per_room <= 79.26 | | | | | | | | | | |--- avg_price_per_room <= 71.87 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 71.87 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- avg_price_per_room > 79.26 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | |--- lead_time <= 102.00 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- avg_price_per_room <= 106.66 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 106.66 | | | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | |--- lead_time > 102.00 | | | | | | | | |--- avg_price_per_room <= 77.79 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- avg_price_per_room > 77.79 | | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | |--- avg_price_per_room > 118.74 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | |--- avg_price_per_room <= 135.58 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- avg_price_per_room > 135.58 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | |--- arrival_date <= 2.00 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- lead_time <= 35.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- lead_time > 35.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- lead_time <= 17.50 | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | |--- lead_time <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- lead_time > 17.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- lead_time <= 126.00 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 126.00 | | | | | | | | | | | |--- weights: [8.40, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 157.95 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- avg_price_per_room > 157.95 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- lead_time <= 102.50 | | | | | | | | |--- weights: [50.38, 0.00] class: 0 | | | | | | | |--- lead_time > 102.50 | | | | | | | | |--- avg_price_per_room <= 179.11 | | | | | | | | | |--- weights: [0.00, 10.51] class: 1 | | | | | | | | |--- avg_price_per_room > 179.11 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1345.58, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.25 | | | | | | |--- lead_time <= 8.50 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [25.89, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- avg_price_per_room <= 99.57 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- avg_price_per_room > 99.57 | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | |--- lead_time > 8.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | | |--- weights: [16.09, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 82.00 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- lead_time > 82.00 | | | | | | | | | | | |--- weights: [5.60, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- avg_price_per_room <= 122.70 | | | | | | | | | |--- weights: [17.49, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 122.70 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- arrival_date <= 21.00 | | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | | |--- arrival_date > 21.00 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | |--- no_of_special_requests > 2.25 | | | | | | |--- weights: [42.68, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.25 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- arrival_date <= 26.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 76.37 | | | | | | | | | |--- weights: [17.49, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 76.37 | | | | | | | | | |--- avg_price_per_room <= 82.52 | | | | | | | | | | |--- lead_time <= 124.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 124.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 82.52 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | |--- lead_time <= 97.00 | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | |--- lead_time > 97.00 | | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | | |--- weights: [0.00, 7.01] class: 1 | | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | |--- arrival_date > 26.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | |--- lead_time <= 131.00 | | | | | | | | | | |--- avg_price_per_room <= 138.60 | | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | | |--- avg_price_per_room > 138.60 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 131.00 | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- weights: [9.80, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | |--- avg_price_per_room <= 79.95 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | |--- avg_price_per_room > 79.95 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | |--- arrival_month > 10.50 | | | | | | |--- avg_price_per_room <= 140.49 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | |--- lead_time <= 134.50 | | | | | | | | | | |--- avg_price_per_room <= 84.80 | | | | | | | | | | | |--- weights: [0.00, 14.01] class: 1 | | | | | | | | | | |--- avg_price_per_room > 84.80 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 134.50 | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | |--- weights: [7.70, 0.00] class: 0 | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | |--- arrival_date <= 7.00 | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | |--- arrival_date > 7.00 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- avg_price_per_room > 140.49 | | | | | | | |--- no_of_week_nights <= 4.00 | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 4.00 | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | |--- no_of_special_requests > 2.25 | | | | | |--- weights: [51.08, 0.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 99.89 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 272.00 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- avg_price_per_room <= 81.76 | | | | | | |--- lead_time <= 161.50 | | | | | | | |--- weights: [21.69, 0.00] class: 0 | | | | | | |--- lead_time > 161.50 | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | |--- lead_time <= 194.00 | | | | | | | | | |--- avg_price_per_room <= 71.12 | | | | | | | | | | |--- avg_price_per_room <= 61.88 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 61.88 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 71.12 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- lead_time > 194.00 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- lead_time <= 203.50 | | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 203.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_date > 25.50 | | | | | | | | |--- weights: [8.40, 0.00] class: 0 | | | | | |--- avg_price_per_room > 81.76 | | | | | | |--- arrival_month <= 9.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- lead_time <= 155.50 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | |--- lead_time > 155.50 | | | | | | | | | |--- weights: [0.00, 7.01] class: 1 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- lead_time <= 243.00 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | |--- lead_time > 243.00 | | | | | | | | | | |--- weights: [4.20, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- weights: [10.50, 0.00] class: 0 | | | | | | |--- arrival_month > 9.50 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | | |--- weights: [0.00, 14.01] class: 1 | | | | | | | | |--- arrival_month > 11.00 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- lead_time <= 151.50 | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | |--- lead_time > 151.50 | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | | |--- weights: [42.68, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- lead_time <= 185.00 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 185.00 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [6.30, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.50 | | | | | | | | |--- no_of_week_nights <= 3.00 | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.00 | | | | | | | | | |--- weights: [0.70, 1.75] class: 1 | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | |--- arrival_date > 21.50 | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | |--- lead_time > 272.00 | | | | |--- avg_price_per_room <= 59.25 | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | |--- avg_price_per_room > 59.25 | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | |--- arrival_date <= 6.00 | | | | | | | | | |--- avg_price_per_room <= 89.50 | | | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | | | | |--- avg_price_per_room > 89.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_date > 6.00 | | | | | | | | | |--- avg_price_per_room <= 68.50 | | | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 68.50 | | | | | | | | | | |--- avg_price_per_room <= 90.33 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- avg_price_per_room > 90.33 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_date > 24.00 | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_week_nights <= 3.00 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 3.00 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | |--- no_of_special_requests <= 1.75 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- lead_time <= 282.75 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | | |--- lead_time > 282.75 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- no_of_special_requests > 1.75 | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- arrival_month > 10.50 | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | |--- weights: [5.60, 0.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- avg_price_per_room <= 35.22 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- arrival_date <= 19.50 | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | |--- arrival_date > 19.50 | | | | | | | |--- arrival_date <= 22.00 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- arrival_date > 22.00 | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- weights: [4.20, 0.00] class: 0 | | | | |--- avg_price_per_room > 35.22 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 50.87 | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | |--- avg_price_per_room > 50.87 | | | | | | | |--- weights: [0.00, 557.04] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- arrival_date <= 3.50 | | | | | | | |--- avg_price_per_room <= 75.42 | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 75.42 | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | |--- arrival_date > 3.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- avg_price_per_room <= 76.87 | | | | | | | | | |--- lead_time <= 215.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | | | |--- lead_time > 215.50 | | | | | | | | | | |--- weights: [0.00, 12.26] class: 1 | | | | | | | | |--- avg_price_per_room > 76.87 | | | | | | | | | |--- lead_time <= 268.75 | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | | | | |--- lead_time > 268.75 | | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- weights: [0.00, 42.04] class: 1 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [0.00, 29.78] class: 1 | | | |--- no_of_special_requests > 0.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- lead_time <= 180.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- lead_time <= 176.00 | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | |--- weights: [20.99, 0.00] class: 0 | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- lead_time > 176.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | |--- weights: [4.90, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | |--- lead_time > 180.50 | | | | | | |--- no_of_special_requests <= 2.25 | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- weights: [0.00, 145.39] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 2.25 | | | | | | | |--- avg_price_per_room <= 72.00 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 72.00 | | | | | | | | |--- weights: [4.90, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 81.81 | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | |--- lead_time <= 153.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 153.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- weights: [37.09, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [7.70, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | |--- avg_price_per_room > 81.81 | | | | | | | |--- avg_price_per_room <= 94.21 | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | |--- lead_time <= 222.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 222.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [0.00, 5.26] class: 1 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- lead_time <= 217.00 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 217.00 | | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | |--- avg_price_per_room > 94.21 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- lead_time <= 172.50 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 172.50 | | | | | | | | | | |--- arrival_date <= 15.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 15.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- avg_price_per_room <= 99.38 | | | | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 99.38 | | | | | | | | | | |--- lead_time <= 180.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 180.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | |--- arrival_month > 11.50 | | | | | | |--- no_of_special_requests <= 2.25 | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | |--- lead_time <= 224.00 | | | | | | | | | |--- weights: [4.20, 0.00] class: 0 | | | | | | | | |--- lead_time > 224.00 | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | | |--- arrival_date > 8.50 | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | |--- avg_price_per_room <= 80.19 | | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.19 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- weights: [3.50, 0.00] class: 0 | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [0.00, 8.76] class: 1 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | | |--- no_of_special_requests > 2.25 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- weights: [2.80, 0.00] class: 0 | |--- avg_price_per_room > 99.89 | | |--- no_of_special_requests <= 2.25 | | | |--- arrival_month <= 11.50 | | | | |--- avg_price_per_room <= 100.06 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | |--- weights: [0.00, 1.75] class: 1 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- lead_time <= 188.00 | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | |--- weights: [0.70, 1.75] class: 1 | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | |--- weights: [0.00, 3.50] class: 1 | | | | | | | |--- arrival_date > 6.50 | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | |--- lead_time > 188.00 | | | | | | | |--- weights: [0.00, 21.02] class: 1 | | | | |--- avg_price_per_room > 100.06 | | | | | |--- weights: [0.00, 1632.59] class: 1 | | | |--- arrival_month > 11.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | |--- weights: [25.19, 0.00] class: 0 | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- arrival_date <= 24.50 | | | | | | |--- weights: [2.80, 0.00] class: 0 | | | | | |--- arrival_date > 24.50 | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- weights: [0.00, 12.26] class: 1 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- lead_time <= 265.00 | | | | | | | | | |--- lead_time <= 235.50 | | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | | |--- weights: [0.70, 0.00] class: 0 | | | | | | | | | |--- lead_time > 235.50 | | | | | | | | | | |--- weights: [1.40, 0.00] class: 0 | | | | | | | | |--- lead_time > 265.00 | | | | | | | | | |--- weights: [0.00, 8.76] class: 1 | | |--- no_of_special_requests > 2.25 | | | |--- weights: [21.69, 0.00] class: 0
# importance of features in the tree building
importances = model1.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
I will be using the DecisionTreeClassifier function to get the ccp_alpha values of a fully grown tree. By using the most effective ccp_alpha which I will be getting from the DecisionTreeClassifier.cost_complexity_pruning_path function.
i will get the most effective tree where impurities has been reduced to an acceptable level.The goal is to get the highest F1 score after each level of prunning.
# Define a Decision Tree Classifier with random state and class weight
clf = DecisionTreeClassifier(random_state=1, class_weight="balanced")
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = abs(path.ccp_alphas), path.impurities
# Print the entire path information (for exploration purposes)
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.00000 | 0.00345 |
| 1 | -0.00000 | 0.00345 |
| 2 | -0.00000 | 0.00345 |
| 3 | -0.00000 | 0.00345 |
| 4 | -0.00000 | 0.00345 |
| ... | ... | ... |
| 1475 | 0.00563 | 0.33245 |
| 1476 | 0.00760 | 0.34005 |
| 1477 | 0.01487 | 0.35492 |
| 1478 | 0.04504 | 0.44500 |
| 1479 | 0.05500 | 0.50000 |
1480 rows × 2 columns
Next, we train a decision tree using the effective alphas. The last value
in ccp_alphas is the alpha value that prunes the whole tree,
leaving the tree, clfs[-1], with one node.
# List to store trained decision tree models
clfs = []
# Iterate over different alpha values obtained from pruning path
for ccp_alpha in ccp_alphas:
# Create a new Decision Tree Classifier with current alpha and other parameters
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced"
)
# Train the model on the training data
clf.fit(X_train, y_train)
# Append the trained model to the list
clfs.append(clf)
# Print information about the last model in the list
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.05500091294910969
For the remainder, we remove the last element in
clfs and ccp_alphas, because it is the trivial tree with only one
node. Here we show that the number of nodes and tree depth decreases as alpha
increases.
# Remove the last model from the list
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
# Extract number of nodes and depth for each model
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
# Create a figure with two subplots
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
# Plot number of nodes vs alpha
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
# Plot number of nodes vs alpha
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
# Adjust layout to prevent overlapping elements
fig.tight_layout()
# Lists to store recall scores for training and testing data
recall_train = []
recall_test = []
# Calculate recall score on training data for each model
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)
# Calculate recall score on testing data for each model
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)
# Lists to store accuracy scores for training and testing data
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
# Get the best model based on the index
best_model = clfs[index_best_model]
# Print details of the best model
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.00023579767770967944,
class_weight='balanced', random_state=1)
# Confusion matrix for the best model on training data
confusion_matrix_sklearn(best_model, X_train, y_train)
print("Test set performance:")
decision_tree_post_perf_train = model_performance_classification_sklearn(
best_model, X_train, y_train
)
decision_tree_post_perf_train
Test set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.81912 | 0.85879 | 0.63554 | 0.73049 |
confusion_matrix_sklearn(best_model, X_test, y_test)
print("Test set performance:")
decision_tree_post_test = model_performance_classification_sklearn(
best_model, X_test, y_test
)
decision_tree_post_test
Test set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79956 | 0.81572 | 0.61956 | 0.70424 |
# Plotting image report showing the rules of a decision tree -
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 92.50 | | | | |--- no_of_weekend_nights <= 3.50 | | | | | |--- arrival_month <= 9.50 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- avg_price_per_room <= 178.44 | | | | | | | | | |--- weights: [285.49, 7.01] class: 0 | | | | | | | | |--- avg_price_per_room > 178.44 | | | | | | | | | |--- weights: [2.10, 5.26] class: 1 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [376.45, 105.10] class: 0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- lead_time <= 27.50 | | | | | | | | |--- avg_price_per_room <= 79.10 | | | | | | | | | |--- weights: [139.95, 21.02] class: 0 | | | | | | | | |--- avg_price_per_room > 79.10 | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | |--- weights: [39.88, 1.75] class: 0 | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- weights: [14.69, 1.75] class: 0 | | | | | | | |--- lead_time > 27.50 | | | | | | | | |--- weights: [30.79, 31.53] class: 1 | | | | | |--- arrival_month > 9.50 | | | | | | |--- weights: [585.67, 45.54] class: 0 | | | | |--- no_of_weekend_nights > 3.50 | | | | | |--- weights: [1.40, 10.51] class: 1 | | | |--- lead_time > 92.50 | | | | |--- avg_price_per_room <= 113.53 | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | |--- weights: [16.09, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | |--- lead_time <= 116.50 | | | | | | | | |--- weights: [69.97, 71.82] class: 1 | | | | | | | |--- lead_time > 116.50 | | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | | | |--- weights: [54.58, 22.77] class: 0 | | | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | | | |--- weights: [7.00, 14.01] class: 1 | | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | | |--- weights: [16.79, 0.00] class: 0 | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | |--- weights: [30.09, 1.75] class: 0 | | | | |--- avg_price_per_room > 113.53 | | | | | |--- weights: [13.29, 29.78] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 3.50 | | | | |--- avg_price_per_room <= 135.83 | | | | | |--- weights: [298.08, 52.55] class: 0 | | | | |--- avg_price_per_room > 135.83 | | | | | |--- lead_time <= 2.50 | | | | | | |--- avg_price_per_room <= 178.78 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- weights: [31.49, 17.52] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- weights: [31.49, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 178.78 | | | | | | | |--- weights: [9.10, 17.52] class: 1 | | | | | |--- lead_time > 2.50 | | | | | | |--- weights: [5.60, 21.02] class: 1 | | | |--- lead_time > 3.50 | | | | |--- avg_price_per_room <= 105.28 | | | | | |--- lead_time <= 24.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [52.48, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [42.68, 54.30] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [24.49, 89.34] class: 1 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- weights: [23.79, 5.26] class: 0 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- weights: [60.88, 68.32] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [77.67, 0.00] class: 0 | | | | | |--- lead_time > 24.50 | | | | | | |--- avg_price_per_room <= 56.55 | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- weights: [11.90, 3.50] class: 0 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- weights: [4.90, 17.52] class: 1 | | | | | | | |--- arrival_date > 15.50 | | | | | | | | |--- weights: [22.39, 1.75] class: 0 | | | | | | |--- avg_price_per_room > 56.55 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 72.78 | | | | | | | | | |--- lead_time <= 42.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [16.79, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [2.10, 5.26] class: 1 | | | | | | | | | |--- lead_time > 42.50 | | | | | | | | | | |--- weights: [36.39, 59.56] class: 1 | | | | | | | | |--- avg_price_per_room > 72.78 | | | | | | | | | |--- weights: [216.92, 632.37] class: 1 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- weights: [83.97, 432.67] class: 1 | | | | |--- avg_price_per_room > 105.28 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- lead_time <= 9.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- weights: [32.89, 136.63] class: 1 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- avg_price_per_room <= 161.83 | | | | | | | | | |--- weights: [20.99, 8.76] class: 0 | | | | | | | | |--- avg_price_per_room > 161.83 | | | | | | | | | |--- weights: [2.10, 12.26] class: 1 | | | | | | |--- lead_time > 9.50 | | | | | | | |--- weights: [317.68, 2226.42] class: 1 | | | | | |--- arrival_month > 10.50 | | | | | | |--- lead_time <= 22.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- weights: [9.80, 12.26] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [32.89, 0.00] class: 0 | | | | | | |--- lead_time > 22.50 | | | | | | | |--- weights: [30.09, 87.59] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 6.50 | | | | |--- no_of_week_nights <= 5.50 | | | | | |--- weights: [739.61, 68.32] class: 0 | | | | |--- no_of_week_nights > 5.50 | | | | | |--- weights: [3.50, 7.01] class: 1 | | | |--- lead_time > 6.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 91.50 | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | |--- weights: [282.69, 1.75] class: 0 | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | |--- weights: [4.90, 7.01] class: 1 | | | | | |--- lead_time > 91.50 | | | | | | |--- weights: [54.58, 22.77] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 118.74 | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | |--- lead_time <= 42.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [62.28, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [473.72, 176.92] class: 0 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [86.07, 0.00] class: 0 | | | | | | | |--- lead_time > 42.50 | | | | | | | | |--- avg_price_per_room <= 66.49 | | | | | | | | | |--- weights: [62.28, 3.50] class: 0 | | | | | | | | |--- avg_price_per_room > 66.49 | | | | | | | | | |--- avg_price_per_room <= 79.26 | | | | | | | | | | |--- avg_price_per_room <= 71.87 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 71.87 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 79.26 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | |--- weights: [5.60, 38.54] class: 1 | | | | | |--- avg_price_per_room > 118.74 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | |--- avg_price_per_room <= 135.58 | | | | | | | | | | | |--- weights: [97.26, 24.52] class: 0 | | | | | | | | | | |--- avg_price_per_room > 135.58 | | | | | | | | | | | |--- weights: [125.25, 80.58] class: 0 | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | |--- weights: [1.40, 8.76] class: 1 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- lead_time <= 35.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [57.38, 17.52] class: 0 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [67.87, 71.82] class: 1 | | | | | | | | | |--- lead_time > 35.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [9.80, 36.79] class: 1 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- lead_time <= 17.50 | | | | | | | | | |--- weights: [67.17, 38.54] class: 0 | | | | | | | | |--- lead_time > 17.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- lead_time <= 126.00 | | | | | | | | | | | |--- weights: [128.75, 162.91] class: 1 | | | | | | | | | | |--- lead_time > 126.00 | | | | | | | | | | | |--- weights: [8.40, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [68.57, 147.14] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- lead_time <= 102.50 | | | | | | | | |--- weights: [50.38, 0.00] class: 0 | | | | | | | |--- lead_time > 102.50 | | | | | | | | |--- weights: [0.70, 10.51] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1345.58, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.25 | | | | | | |--- lead_time <= 8.50 | | | | | | | |--- weights: [34.29, 3.50] class: 0 | | | | | | |--- lead_time > 8.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- weights: [16.09, 1.75] class: 0 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- weights: [67.87, 63.06] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [46.18, 12.26] class: 0 | | | | | |--- no_of_special_requests > 2.25 | | | | | | |--- weights: [42.68, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.25 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- arrival_date <= 26.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 76.37 | | | | | | | | | |--- weights: [17.49, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 76.37 | | | | | | | | | |--- avg_price_per_room <= 82.52 | | | | | | | | | | |--- weights: [9.10, 15.77] class: 1 | | | | | | | | | |--- avg_price_per_room > 82.52 | | | | | | | | | | |--- weights: [143.44, 36.79] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [9.10, 10.51] class: 1 | | | | | | |--- arrival_date > 26.50 | | | | | | | |--- weights: [23.09, 22.77] class: 0 | | | | | |--- arrival_month > 10.50 | | | | | | |--- weights: [37.09, 45.54] class: 1 | | | | |--- no_of_special_requests > 2.25 | | | | | |--- weights: [51.08, 0.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 99.89 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 272.00 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- avg_price_per_room <= 81.76 | | | | | | |--- lead_time <= 161.50 | | | | | | | |--- weights: [21.69, 0.00] class: 0 | | | | | | |--- lead_time > 161.50 | | | | | | | |--- weights: [57.38, 31.53] class: 0 | | | | | |--- avg_price_per_room > 81.76 | | | | | | |--- weights: [23.09, 31.53] class: 1 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- lead_time <= 151.50 | | | | | | |--- weights: [0.70, 3.50] class: 1 | | | | | |--- lead_time > 151.50 | | | | | | |--- weights: [53.88, 5.26] class: 0 | | | |--- lead_time > 272.00 | | | | |--- weights: [32.19, 45.54] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [9.80, 660.39] class: 1 | | | |--- no_of_special_requests > 0.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- lead_time <= 180.50 | | | | | | |--- weights: [32.89, 10.51] class: 0 | | | | | |--- lead_time > 180.50 | | | | | | |--- no_of_special_requests <= 2.25 | | | | | | | |--- weights: [7.00, 157.65] class: 1 | | | | | | |--- no_of_special_requests > 2.25 | | | | | | | |--- weights: [5.60, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 81.81 | | | | | | | |--- weights: [59.48, 14.01] class: 0 | | | | | | |--- avg_price_per_room > 81.81 | | | | | | | |--- weights: [107.76, 99.85] class: 0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [17.49, 36.79] class: 1 | |--- avg_price_per_room > 99.89 | | |--- no_of_special_requests <= 2.25 | | | |--- arrival_month <= 11.50 | | | | |--- weights: [4.20, 1660.62] class: 1 | | | |--- arrival_month > 11.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- weights: [25.89, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- weights: [6.30, 29.78] class: 1 | | |--- no_of_special_requests > 2.25 | | | |--- weights: [21.69, 0.00] class: 0
#identifying the most important features
importances = best_model.feature_importances_
indices = np.argsort(importances)
#Visualizing the most important features
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# training performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_train_without.T,
decision_tree_perf_train.T,
decision_tree_tune_perf_train.T,
decision_tree_post_perf_train.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree without class_weight",
"Decision Tree with class_weight",
"Decision Tree (Pre-Pruning)",
"decision_tree_post_train",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Decision Tree without class_weight | Decision Tree with class_weight | Decision Tree (Pre-Pruning) | decision_tree_post_train | |
|---|---|---|---|---|
| Accuracy | 0.99655 | 0.99655 | 0.64681 | 0.81912 |
| Recall | 0.98791 | 1.00000 | 0.79735 | 0.85879 |
| Precision | 1.00000 | 0.98805 | 0.43522 | 0.63554 |
| F1 | 0.99392 | 0.99399 | 0.56309 | 0.73049 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
decision_tree_perf_test_without.T,
decision_tree_perf_test.T,
decision_tree_tune_perf_test.T,
decision_tree_post_test.T
],
axis=1,
)
models_test_comp_df.columns = [
"Decision Tree without class_weight",
"Decision Tree with class_weight",
"Decision Tree (Pre-Pruning)",
"decision_tree_post_test"
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Decision Tree without class_weight | Decision Tree with class_weight | Decision Tree (Pre-Pruning) | decision_tree_post_test | |
|---|---|---|---|---|
| Accuracy | 0.79150 | 0.79488 | 0.64315 | 0.79956 |
| Recall | 0.64298 | 0.63988 | 0.80373 | 0.81572 |
| Precision | 0.64384 | 0.65233 | 0.43985 | 0.61956 |
| F1 | 0.64341 | 0.64604 | 0.56856 | 0.70424 |
All the models here with the exception of Decision Tree without class weight and the one with default class weight perfor very well and can generalize well with unseen data.
However, the decision_tree_post_test is the best and will be selected becuase it has the best F1 score, as well as, Recall score on testing data.
#Assigning variable name to the best decision tree model
Final_decision_Tree_model = best_model
#Assigning a variable name to the best logistic regression model
Final_Regression_model = Logistic_regression_model_train_4
# training performance comparison
Final_models_train_comp = pd.concat(
[
decision_tree_post_perf_train.T,Logistic_regression_model_train_4.T
],
axis=1,
)
Final_models_train_comp.columns = [
"Decision Tree (Post-Pruning)", "Logistic_regression_model_train_4"
]
print("Training performance comparison:")
Final_models_train_comp
Training performance comparison:
| Decision Tree (Post-Pruning) | Logistic_regression_model_train_4 | |
|---|---|---|
| Accuracy | 0.81912 | 0.78221 |
| Recall | 0.85879 | 0.76204 |
| Precision | 0.63554 | 0.59206 |
| F1 | 0.73049 | 0.66638 |
# testing performance comparison
Final_models_test_comp = pd.concat(
[
decision_tree_post_test.T,Logistic_regression_model_test_4.T
],
axis=1,
)
Final_models_test_comp.columns = [
"decision_tree_post_test", "Logistic_regression_model_test_2"
]
print("Test performance comparison:")
Final_models_test_comp
Test performance comparison:
| decision_tree_post_test | Logistic_regression_model_test_2 | |
|---|---|---|
| Accuracy | 0.79956 | 0.76929 |
| Recall | 0.81572 | 0.75533 |
| Precision | 0.61956 | 0.58134 |
| F1 | 0.70424 | 0.65701 |
So I will be assigning the best model selected as my final model, then make predictions based on the model.
# Assighning best model selected to to final model variable.
final_model = Final_decision_Tree_model
# Plotting image report showing the rules of final model decision tree -
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
final_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of final model decision tree -
print(tree.export_text(final_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 92.50 | | | | |--- no_of_weekend_nights <= 3.50 | | | | | |--- arrival_month <= 9.50 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- avg_price_per_room <= 178.44 | | | | | | | | | |--- weights: [285.49, 7.01] class: 0 | | | | | | | | |--- avg_price_per_room > 178.44 | | | | | | | | | |--- weights: [2.10, 5.26] class: 1 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [376.45, 105.10] class: 0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- lead_time <= 27.50 | | | | | | | | |--- avg_price_per_room <= 79.10 | | | | | | | | | |--- weights: [139.95, 21.02] class: 0 | | | | | | | | |--- avg_price_per_room > 79.10 | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | |--- weights: [39.88, 1.75] class: 0 | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- weights: [14.69, 1.75] class: 0 | | | | | | | |--- lead_time > 27.50 | | | | | | | | |--- weights: [30.79, 31.53] class: 1 | | | | | |--- arrival_month > 9.50 | | | | | | |--- weights: [585.67, 45.54] class: 0 | | | | |--- no_of_weekend_nights > 3.50 | | | | | |--- weights: [1.40, 10.51] class: 1 | | | |--- lead_time > 92.50 | | | | |--- avg_price_per_room <= 113.53 | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | |--- weights: [16.09, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | |--- lead_time <= 116.50 | | | | | | | | |--- weights: [69.97, 71.82] class: 1 | | | | | | | |--- lead_time > 116.50 | | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | | | |--- weights: [54.58, 22.77] class: 0 | | | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | | | |--- weights: [7.00, 14.01] class: 1 | | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | | |--- weights: [16.79, 0.00] class: 0 | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | |--- weights: [30.09, 1.75] class: 0 | | | | |--- avg_price_per_room > 113.53 | | | | | |--- weights: [13.29, 29.78] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 3.50 | | | | |--- avg_price_per_room <= 135.83 | | | | | |--- weights: [298.08, 52.55] class: 0 | | | | |--- avg_price_per_room > 135.83 | | | | | |--- lead_time <= 2.50 | | | | | | |--- avg_price_per_room <= 178.78 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- weights: [31.49, 17.52] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- weights: [31.49, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 178.78 | | | | | | | |--- weights: [9.10, 17.52] class: 1 | | | | | |--- lead_time > 2.50 | | | | | | |--- weights: [5.60, 21.02] class: 1 | | | |--- lead_time > 3.50 | | | | |--- avg_price_per_room <= 105.28 | | | | | |--- lead_time <= 24.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [52.48, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [42.68, 54.30] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [24.49, 89.34] class: 1 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- weights: [23.79, 5.26] class: 0 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- weights: [60.88, 68.32] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [77.67, 0.00] class: 0 | | | | | |--- lead_time > 24.50 | | | | | | |--- avg_price_per_room <= 56.55 | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- weights: [11.90, 3.50] class: 0 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- weights: [4.90, 17.52] class: 1 | | | | | | | |--- arrival_date > 15.50 | | | | | | | | |--- weights: [22.39, 1.75] class: 0 | | | | | | |--- avg_price_per_room > 56.55 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 72.78 | | | | | | | | | |--- lead_time <= 42.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [16.79, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [2.10, 5.26] class: 1 | | | | | | | | | |--- lead_time > 42.50 | | | | | | | | | | |--- weights: [36.39, 59.56] class: 1 | | | | | | | | |--- avg_price_per_room > 72.78 | | | | | | | | | |--- weights: [216.92, 632.37] class: 1 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- weights: [83.97, 432.67] class: 1 | | | | |--- avg_price_per_room > 105.28 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- lead_time <= 9.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- weights: [32.89, 136.63] class: 1 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- avg_price_per_room <= 161.83 | | | | | | | | | |--- weights: [20.99, 8.76] class: 0 | | | | | | | | |--- avg_price_per_room > 161.83 | | | | | | | | | |--- weights: [2.10, 12.26] class: 1 | | | | | | |--- lead_time > 9.50 | | | | | | | |--- weights: [317.68, 2226.42] class: 1 | | | | | |--- arrival_month > 10.50 | | | | | | |--- lead_time <= 22.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- weights: [9.80, 12.26] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [32.89, 0.00] class: 0 | | | | | | |--- lead_time > 22.50 | | | | | | | |--- weights: [30.09, 87.59] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 6.50 | | | | |--- no_of_week_nights <= 5.50 | | | | | |--- weights: [739.61, 68.32] class: 0 | | | | |--- no_of_week_nights > 5.50 | | | | | |--- weights: [3.50, 7.01] class: 1 | | | |--- lead_time > 6.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 91.50 | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | |--- weights: [282.69, 1.75] class: 0 | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | |--- weights: [4.90, 7.01] class: 1 | | | | | |--- lead_time > 91.50 | | | | | | |--- weights: [54.58, 22.77] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 118.74 | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | |--- lead_time <= 42.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [62.28, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [473.72, 176.92] class: 0 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [86.07, 0.00] class: 0 | | | | | | | |--- lead_time > 42.50 | | | | | | | | |--- avg_price_per_room <= 66.49 | | | | | | | | | |--- weights: [62.28, 3.50] class: 0 | | | | | | | | |--- avg_price_per_room > 66.49 | | | | | | | | | |--- avg_price_per_room <= 79.26 | | | | | | | | | | |--- avg_price_per_room <= 71.87 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 71.87 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 79.26 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | |--- weights: [5.60, 38.54] class: 1 | | | | | |--- avg_price_per_room > 118.74 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | |--- avg_price_per_room <= 135.58 | | | | | | | | | | | |--- weights: [97.26, 24.52] class: 0 | | | | | | | | | | |--- avg_price_per_room > 135.58 | | | | | | | | | | | |--- weights: [125.25, 80.58] class: 0 | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | |--- weights: [1.40, 8.76] class: 1 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- lead_time <= 35.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [57.38, 17.52] class: 0 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [67.87, 71.82] class: 1 | | | | | | | | | |--- lead_time > 35.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [9.80, 36.79] class: 1 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- lead_time <= 17.50 | | | | | | | | | |--- weights: [67.17, 38.54] class: 0 | | | | | | | | |--- lead_time > 17.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- lead_time <= 126.00 | | | | | | | | | | | |--- weights: [128.75, 162.91] class: 1 | | | | | | | | | | |--- lead_time > 126.00 | | | | | | | | | | | |--- weights: [8.40, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [68.57, 147.14] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- lead_time <= 102.50 | | | | | | | | |--- weights: [50.38, 0.00] class: 0 | | | | | | | |--- lead_time > 102.50 | | | | | | | | |--- weights: [0.70, 10.51] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1345.58, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.25 | | | | | | |--- lead_time <= 8.50 | | | | | | | |--- weights: [34.29, 3.50] class: 0 | | | | | | |--- lead_time > 8.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- weights: [16.09, 1.75] class: 0 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- weights: [67.87, 63.06] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [46.18, 12.26] class: 0 | | | | | |--- no_of_special_requests > 2.25 | | | | | | |--- weights: [42.68, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.25 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- arrival_date <= 26.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 76.37 | | | | | | | | | |--- weights: [17.49, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 76.37 | | | | | | | | | |--- avg_price_per_room <= 82.52 | | | | | | | | | | |--- weights: [9.10, 15.77] class: 1 | | | | | | | | | |--- avg_price_per_room > 82.52 | | | | | | | | | | |--- weights: [143.44, 36.79] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [9.10, 10.51] class: 1 | | | | | | |--- arrival_date > 26.50 | | | | | | | |--- weights: [23.09, 22.77] class: 0 | | | | | |--- arrival_month > 10.50 | | | | | | |--- weights: [37.09, 45.54] class: 1 | | | | |--- no_of_special_requests > 2.25 | | | | | |--- weights: [51.08, 0.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 99.89 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 272.00 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- avg_price_per_room <= 81.76 | | | | | | |--- lead_time <= 161.50 | | | | | | | |--- weights: [21.69, 0.00] class: 0 | | | | | | |--- lead_time > 161.50 | | | | | | | |--- weights: [57.38, 31.53] class: 0 | | | | | |--- avg_price_per_room > 81.76 | | | | | | |--- weights: [23.09, 31.53] class: 1 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- lead_time <= 151.50 | | | | | | |--- weights: [0.70, 3.50] class: 1 | | | | | |--- lead_time > 151.50 | | | | | | |--- weights: [53.88, 5.26] class: 0 | | | |--- lead_time > 272.00 | | | | |--- weights: [32.19, 45.54] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [9.80, 660.39] class: 1 | | | |--- no_of_special_requests > 0.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- lead_time <= 180.50 | | | | | | |--- weights: [32.89, 10.51] class: 0 | | | | | |--- lead_time > 180.50 | | | | | | |--- no_of_special_requests <= 2.25 | | | | | | | |--- weights: [7.00, 157.65] class: 1 | | | | | | |--- no_of_special_requests > 2.25 | | | | | | | |--- weights: [5.60, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 81.81 | | | | | | | |--- weights: [59.48, 14.01] class: 0 | | | | | | |--- avg_price_per_room > 81.81 | | | | | | | |--- weights: [107.76, 99.85] class: 0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [17.49, 36.79] class: 1 | |--- avg_price_per_room > 99.89 | | |--- no_of_special_requests <= 2.25 | | | |--- arrival_month <= 11.50 | | | | |--- weights: [4.20, 1660.62] class: 1 | | | |--- arrival_month > 11.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- weights: [25.89, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- weights: [6.30, 29.78] class: 1 | | |--- no_of_special_requests > 2.25 | | | |--- weights: [21.69, 0.00] class: 0
#identifying the most important features in the final model
importances = final_model.feature_importances_
indices = np.argsort(importances)
#Visualizing the most important features in the final model
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
There are chances of booking cancellation with lead time over 150 days(5 months). Five months is a long time to make any booking. During this period, average cost of rooms might have reduced thus the need to rebook, or the guest needs might have been overtaken by events,thus the guest may not have any need for the booking anymore.
Guest with at least 1 special request are more likely to cancel bookings. This may like be due to such services included in the special request by the customer not provided by the hotel.
Guests whose bookings were made online are more likely to cancel them. The ease to cancel bookings online and even the anonymity associated with online booking may be the reason for this phenomenon. Another possibility could be that the there are not enough enticing elements on the hotel webpage compared to competitiors to keep the guest interested in the hotel.
Bookings with average room prices equal over 100 Euros are likely to cancel bookings. This may be due to the customers finding more bargain deals from competing hotel at same price range or lesser.
Guest who book beyond September are have a higher chance of cancelling their bookings before checking-in. Beyond September, it is observed that demand for hotel rooms drop, thus guest usually have the luxury picking the best bargains across different hotels who may have discounts to attract guest due to reduced demand.
Guess who do not have meal plan selected are more likely to cancel bookings. These category of guest are likely who do not have their stay previously planned and thus may likly cancel at any point especially if they get better deals from other hotels.
Guest who book less than four weekend nights are least likely to cancel their bookings. These group of guest may have had their stay all planned out and may have done their research about the hotel before booking,hence their least likelihood of cancelling. These may include travellers or even holiday-fun seekers.
The penalty fees for booking cancellation with lead times over 5 month should be increased to level that serves as a deterent for cancellation.It should carry the highest cancellation charge especially if they were suppose to check-in during the months of July,August, and September which are the peak moanths for guest arriving into the hotel, and may lead to financial loss to the hotel.
The hotel should ensure that the number of additional services offered is increased to a level that it could compete or even surpass other hotels in the same city.
The hotels webpage should be refreshed to be more intutive and attractive to guest. The page should be able to showcase all the services it offers in such a way the guest can easily know about. it should also have a dedicated 24 customer service chat or help desk where the guest can make inquiries(could be AI-powered).Regular discounts should be regularly posted on the webpage. The hotel should regularly send out mails about its latest discounts to those who enter their email on the landing page, as well as, old customers.
The hotel should review quality of services and rooms they offer 100 Euros or above to match or even surpass what competing hotels in city offer. They should have better deal for this price range.
More deals and discounts should be rolled out during months of October, November, December.
A campaign or investigation by the hotel to identify the most popular factors that drive holiday-ssekers and long-term guest to hotels in the city should be done. This may involve collaborating with travel agencies to collect more data about these category of people, or even through online polls. When these factors are identified, they should incorparated into the hotels strategic plans to attract guest to their hotel.
Further information like demographic data, sex, occupation, and purpose of visit will be needed to get more insights about the factors around booking cancellations in the hotel.
Further data should also include more repeat clients to get a more rounded insight, the few that was included in this data set were actually duplicates thus were cleared during pre processing.
Online booking should attract 2% percent cancellation charges for the initial booking cost,while refunds should take 2 weeks to process. Then 4 weeks refund cool-off period should apply for those with lead times over 5 months, while their cancellation charge should be up to 5% of booking cost.